Find answers from the community

Updated 3 months ago

Calls

I get the following trace when using the callback manager. I'm just doing a query from the query_engine - I'm seeing that it's hitting the LLM more than once - causing the response time to be longer. Why is that? What can cause that to happen? @WhiteFang_Jr @Logan M

Plain Text
********
Trace: query
    |_query -> 6.464402 seconds
      |_synthesize -> 5.598681 seconds
        |_templating -> 2.5e-05 seconds
        |_llm -> 2.454756 seconds
        |_templating -> 2.8e-05 seconds
        |_llm -> 3.094954 seconds
********
L
1 comment
This is expected. If more nodes are retrieved than can fit into one llm call, it makes multiple to refine answer, so that the llm can read all the retrieved text
Add a reply
Sign up and join the conversation on Discord