Find answers from the community

Updated 3 weeks ago

I am building a multi tool AI agent with 3 different tools, including QueryEngine Tools and a Function Tool, but my agent answers queries but doesn't return any answer, instead diverging from the original query.

Hi Everyone, no doubt llama_index is an amazing tool to explore, but i am facing an issue.
Detailed description of what has been done
I am building a multi tool AI agent, which has 3 different tools.
2 QueryEngine Tools and one Function Tool.

I used llama_index evaluation metrics to measure my model performance, it doing good. The only issue i am facing is my agent answers the given query but is does not returns any answer, instead it get diverges from original query ending up by either hitting maximum iteration or no response.
I am attaching a SS of my agent inference, to give a clear picture of what is happening.

Please help me out with this i have tried change prompt templates, contexts of tool, engine and agent.
Tried to make a wrapper class that sets the active query to None if the agent diverges from original query but nothing is working.


Any of your help, insights would be very useful to me. Please help me out with this.
Thank You.
Attachment
Screenshot_2024-12-29_212437.png
L
a
2 comments
You can see the LLM wrote the answer, and then a whole lot more underneath.

If you look at the source code, because it checks for Action: before Answer: in the output parser, it does not detect the end of the react loop.
https://github.com/run-llama/llama_index/blob/fd1edffd20cbf21085886b96b91c9b837f80a915/llama-index-core/llama_index/core/agent/react/output_parser.py#L104

Have you tried just using a different LLM? tbh open-source LLMs make terrible agents

Alternatively, you could write your own output parser and pass it in (using the above as the base?)
First of all, thanks a lot for quick response.
Yes i tried some open source llms. But i have very limited options as i am implementing llms capped with Virtual Language Model (vllm) for better performance.
Sorry about having limited source as i am working on colab free tier GPU.

The model i am currently using is microsoft/Phi-3-mini-4k-instruct.

I will try making a custom output parser fs.

Decreasing the max_new_tokens or fine-tunig the llm on the data would help ?
It would be great if you could suggest me some alternate ways. or may i share my code if that helps....

Thanks a lot for your time @Logan M !!
Add a reply
Sign up and join the conversation on Discord