Find answers from the community

Updated 4 months ago

any recommendation for a good solution

At a glance

any recommendation for a good solution to monitor latency during RAG? currently it takes me 10~20 seconds to generate a response, and i want to figure out which stage is causing the latency

5 comments

bbmax

That's some good stuff, here is our observability docs. https://gpt-index.readthedocs.io/en/stable/module_guides/observability/observability.html

I'm not familiar with all of the Partner's llama index supports but I bet some of them will handle latency

bbmax

as well as you can tie into any event https://gpt-index.readthedocs.io/en/stable/module_guides/observability/callbacks/root.html

bbmax

https://gpt-index.readthedocs.io/en/stable/examples/callbacks/LlamaDebugHandler.html

bbmax

might be helpful!!

TTony L

Thanks! thats helpful

Add a reply