Find answers from the community

Updated 2 months ago

LLM calls

Anyone here help me understand how many LLM calls happen when we use a single query engine query method? (say retriever query engine, default params)
L
V
a
4 comments
It depends on the top k, your chunk size, and how much input the LLM can fit

LlamaIndex will take all nodes and "compact" the prompt, stuffing as much text into each llm call

If all text fits in one, then it's a single LLM call. With default settings, this should be always true
Is it different or the same for Agents?
With Agents, its a bit different. Usually these work in rounds/iterations, where each iteration would require an LLM call.
Yea as Andrei mentioned, agents can involve more calls. At a minimum there's 3 -- one to read the chat history + latest message, to either write a response or call a tool. One to call the tool. And a last one to either write a final response (or continue the loop and call another tool)
Add a reply
Sign up and join the conversation on Discord