I'm currently using llama_index. I tested it with a 850KB pdf file (5 pages). I generated the index for that file using default settings, then queried it using "text-davinci-003", temperature=0.7, SentenceEmbeddingOptimizer, response_mode="compact". It takes around 20 seconds to finish executing the script. Is there anything that can be done to reduce the response time?
Ah, that could work. Now that I think about it, the online service I tested for PDF Q&A displayed words one at a time, but I thought that was just a fancy effect they added for aesthetics. I'll try that, thanks.