Hi Everyone! My current solution suffers from latency issues that negatively affect the user experience. We are using the Open AI with RAG, and as I'm new to this space and the project is directly handed over to me, I would appreciate the suggestions or advice on which area to look for to reduce the latency.
Hi everyone, I have seen condensed question + context mode for the chat engine in llama docs, but it's only available for OpenAI. Can somebody suggest if it's achievable for anthropic LLM and llama index?
I'm not using embeddings or any other vector stores as of now, as I'm new to LLMs and have built a lot of basic stuff without adding any complexity. But I would love to have some new suggestions and learnings.
Hi everyone, I have seen condensed question + context mode for the chat engine in llama docs, but it's only available for OpenAI. Can somebody suggest if it's achievable for anthropic LLM and llama index?
I'm not using embeddings or any other vector stores as of now, as I'm new to LLMs and have built a lot of basic stuff without adding any complexity. But I would love to have some new suggestions and learnings.