Hey, I've built a chatbot using llama-index however I feel that they is a lot of latency to get an answer even if I only use a small vector index. Btw I'm using the free pinecone version. do you think it mainly do to this ?
Even when trying to reduce the size of each chunk and the number of similarity-top-k the answer is slow for a chatbot. I saw some chatbot app like Botsonic having very great answering time do you think they are built over Langchain or Llama-Index ?
Any chatbot is going to be limited by how long the LLM call(s) take. Even with llama-index, 99% of the runtime is spent calling the LLM. When using OpenAI, LLM calls can take various amounts of time depending on their server load
Moving from llama-index to purely langchain likely won't improve latency, at least in my opinion.
Hey Logan, I have an endpoint setup using text-generation-inference. Do you have any examples to make it work with llama-index? I’ve tried using the langchain huggingfacetextgeninference class but it doesn’t work out of the box