Find answers from the community

Updated 2 months ago

Reducing Latency Issues with Open AI and RAG

Hi Everyone! My current solution suffers from latency issues that negatively affect the user experience. We are using the Open AI with RAG, and as I'm new to this space and the project is directly handed over to me, I would appreciate the suggestions or advice on which area to look for to reduce the latency.

5 comments

WWhiteFang_Jr

Hi, it would be helpful if you could describe more about your project/problem statement

AAlish Satani

Thanks for the response @WhiteFang_Jr. As I have just taken over the project, I'm not familiar with the whole internal working yet. Also not allowed to disclose the implementation details. We are using Open AI APIs and Llama for RAG and Feeding Docs for retrieval.

I would appreciate advice on areas to look for improvement or strategies for the same.

WWhiteFang_Jr

how much time it is taking before answering

WWhiteFang_Jr

On the top:

Check where the time is actually being consumed using observation tools like Arize phoenix: https://docs.llamaindex.ai/en/stable/module_guides/observability/#arize-phoenix-local
Try streaming your response as this will reduce time a lot!!

Point one will help you understanding where the actual problem lies and then maybe I can help you more!

AAlish Satani

Thanks @WhiteFang_Jr , I will check time consuption by each part of the process.

Add a reply