The post asks how many LLM (Large Language Model) calls happen when using a single query engine query method, such as a retriever query engine with default parameters. The comments provide the following insights:
One community member explains that the number of LLM calls depends on factors like the top k, chunk size, and how much input the LLM can fit. They mention that LlamaIndex will "compact" the prompt and try to fit as much text into each LLM call. If all the text fits in one call, then it's a single LLM call with the default settings.
Another community member asks if it's different or the same for Agents, and the response is that with Agents, it's a bit different. Agents usually work in rounds or iterations, where each iteration would require an LLM call.
A final community member elaborates, stating that with Agents, there's a minimum of 3 LLM calls: one to read the chat history and latest message, one to call a tool, and a last one to either write a final response or continue the loop and call another tool.
There is no explicitly marked answer in the provided information.
Yea as Andrei mentioned, agents can involve more calls. At a minimum there's 3 -- one to read the chat history + latest message, to either write a response or call a tool. One to call the tool. And a last one to either write a final response (or continue the loop and call another tool)