I am currently trying to build a chatbot for our website using LlamaIndex and chatGPT. Our chatbot has around 50 documents, each around 1-2 pages long, containing tutorials and other information from our site. While the answers I'm getting are great, the performance is slow. On average, it takes around 15-20 seconds to retrieve an answer, which is not practical for our use case.
I have tried using Optimizers, as suggested in the documentation, but haven't seen much improvement. Currently, I am using GPTSimpleVectorIndex and haven't tested other indexes yet.
I am pretty new to this, would like to hear if this is expected times or if it could be improved by, e.g., building indices in a more efficient way, setting different params, etc. Basically looking for any suggestions on how to improve the performance of the bot so that it can provide answers more quickly.
Yea, for the most part, the main limitation for speed is how busy openai servers are haha
I saw you also tried playing with the top k and chunk size
If you decrease chunk size to about 1024, top k of 1-3, and also set response_mode="compact", that's about as fast + accurate as it can get with openai ๐ซ
Got it! I will leave it like it is then. I think once there is streaming working the timing will not feel that bad. Thanks for the explanation @Logan M!