Find answers from the community

Updated 6 months ago

Calls

At a glance

The community member is experiencing an issue where the query engine is hitting the language model (LLM) more than once, causing the response time to be longer. The trace shows that the query, synthesis, and LLM calls are taking several seconds each. A community member explains that this is expected behavior if more nodes are retrieved than can fit into a single LLM call, as the system makes multiple calls to refine the answer and ensure the LLM can read all the retrieved text.

I get the following trace when using the callback manager. I'm just doing a query from the query_engine - I'm seeing that it's hitting the LLM more than once - causing the response time to be longer. Why is that? What can cause that to happen? @WhiteFang_Jr @Logan M

Plain Text
********
Trace: query
    |_query -> 6.464402 seconds
      |_synthesize -> 5.598681 seconds
        |_templating -> 2.5e-05 seconds
        |_llm -> 2.454756 seconds
        |_templating -> 2.8e-05 seconds
        |_llm -> 3.094954 seconds
********
L
1 comment
This is expected. If more nodes are retrieved than can fit into one llm call, it makes multiple to refine answer, so that the llm can read all the retrieved text
Add a reply
Sign up and join the conversation on Discord