Hi Everyone! My current solution suffers from latency issues that negatively affect the user experience. We are using the Open AI with RAG, and as I'm new to this space and the project is directly handed over to me, I would appreciate the suggestions or advice on which area to look for to reduce the latency.
Thanks for the response @WhiteFang_Jr. As I have just taken over the project, I'm not familiar with the whole internal working yet. Also not allowed to disclose the implementation details. We are using Open AI APIs and Llama for RAG and Feeding Docs for retrieval.
I would appreciate advice on areas to look for improvement or strategies for the same.