Find answers from the community

Updated last month

I am building a multi tool AI agent with 3 different tools, including QueryEngine Tools and a Function Tool, but my agent answers queries but doesn't return any answer, instead diverging from the original query.

At a glance

The community member is building a multi-tool AI agent with 3 different tools: 2 QueryEngine Tools and one Function Tool. They used llama_index evaluation metrics to measure their model's performance, which was good. However, the issue they are facing is that their agent answers the given query but does not return any answer, instead it diverges from the original query, either hitting the maximum iteration or providing no response.

In the comments, another community member suggests that the issue may be due to the way the output parser checks for "Action:" before "Answer:" in the output, which may not detect the end of the react loop. They recommend trying a different LLM, as open-source LLMs may not work well for agents. Alternatively, they suggest writing a custom output parser and passing it in.

The original community member responds that they have limited options as they are using a Microsoft LLM capped with a Virtual Language Model (vllm) for better performance, and they are working on a Colab free tier GPU. They mention trying to make a custom output parser and ask for other suggestions, such as decreasing the max_new_tokens or fine-tuning the LLM on the data.

Useful resources
Hi Everyone, no doubt llama_index is an amazing tool to explore, but i am facing an issue.
Detailed description of what has been done
I am building a multi tool AI agent, which has 3 different tools.
2 QueryEngine Tools and one Function Tool.

I used llama_index evaluation metrics to measure my model performance, it doing good. The only issue i am facing is my agent answers the given query but is does not returns any answer, instead it get diverges from original query ending up by either hitting maximum iteration or no response.
I am attaching a SS of my agent inference, to give a clear picture of what is happening.

Please help me out with this i have tried change prompt templates, contexts of tool, engine and agent.
Tried to make a wrapper class that sets the active query to None if the agent diverges from original query but nothing is working.


Any of your help, insights would be very useful to me. Please help me out with this.
Thank You.
Attachment
Screenshot_2024-12-29_212437.png
L
a
2 comments
You can see the LLM wrote the answer, and then a whole lot more underneath.

If you look at the source code, because it checks for Action: before Answer: in the output parser, it does not detect the end of the react loop.
https://github.com/run-llama/llama_index/blob/fd1edffd20cbf21085886b96b91c9b837f80a915/llama-index-core/llama_index/core/agent/react/output_parser.py#L104

Have you tried just using a different LLM? tbh open-source LLMs make terrible agents

Alternatively, you could write your own output parser and pass it in (using the above as the base?)
First of all, thanks a lot for quick response.
Yes i tried some open source llms. But i have very limited options as i am implementing llms capped with Virtual Language Model (vllm) for better performance.
Sorry about having limited source as i am working on colab free tier GPU.

The model i am currently using is microsoft/Phi-3-mini-4k-instruct.

I will try making a custom output parser fs.

Decreasing the max_new_tokens or fine-tunig the llm on the data would help ?
It would be great if you could suggest me some alternate ways. or may i share my code if that helps....

Thanks a lot for your time @Logan M !!
Add a reply
Sign up and join the conversation on Discord