Could anyone explain what a LlamaIndex Agent expects in an astream_chat() call when it receives tools to use? Specifically, in a non-streaming call, I send back a list of tools parsed from the LLM response in the request.tools list.
I'm trying to determine whether, on my backend, I should:
- Detect a tool call, stop yielding tokens to the agent until I encounter a stop token, then parse out the tools and send the final buffered request with request.tools included,
or
- Continue streaming tokens to the agent and only in the final response send the request back with request.tools.
Which approach is correct?