Find answers from the community

Updated 5 months ago

Hey, Im trying ReAct agent + gpt-4 vs

At a glance

The community member is trying to use the ReAct agent with GPT-4 and Llama2-7b, and they are noticing that Llama2 is going into an endless loop instead of reasoning like GPT-4. The comments suggest that this could be due to Llama2 being biased towards generating long output at the expense of precision, or it could be related to how the tools are being used. The community members also discuss potential issues with the setup, such as the use of prefixes and the prompt wrapping. They suggest trying other LLMs like zephyr-7b-beta, which seems to perform better as an agent according to the documentation.

Useful resources
Hey, Im trying ReAct agent + gpt-4 vs llama2-7b. Both are the same code, only difference is the llm, any idea why llama2 is going on an endless loop instead of reasoning like gpt?
Attachments
image.png
image.png
V
T
8 comments
On the Llama side, was that one generation, or actually multiple times of using tools?
If all of those pink lines came from just one pass of text generation, then it it because Llama is biased to generate long output, in sacrifice of precision. (Metaphorically, you can say that Llama isn't condifent enough to stop talking when it had already said enough; much like a person with low self-esteem.)
If those pink lines actually came from multiple turns of using tools, then it might be because of the same reason underlying my question here: https://discord.com/channels/1059199217496772688/1198847208234160228
Another thing that caught my eye was the user: and assistant: prefixes in your Llama log.
Isn't Llama using [INST] and [/INST] instead? Check out the "Meta Llama 2 Chat" preset I attached below:
Attachment
image.png
this is my setup, the problem is it is not using any tools and its reasoning is also very questionable, am I doing something wrong or llama just isnt that good of an agent yet?
Attachment
image.png
I myself use OpenAILike instead of HuggingFaceLLM, so I actually don't know how most of the parameters you gave there would affect the quality.

For example, I don't know whether HuggingFaceLLM would take care of the prompt wrapping for you or not, so there is a chance that your query_wrapper_prompt is doing more detrimental work than it is helpful.

Since you are running models locally anyway, would you like to try using OpenAiLike and use LM Studio to serve the LLM?

Here's how:

Plain Text
from llama_index.llms import ChatMessage, OpenAILike  
  
llm = OpenAILike(  
    api_base="http://localhost:1234/v1",  
    timeout=600,  # secs  
    api_key="loremIpsum",  
    is_chat_model=True,  
    context_window=32768,  
)  
chat_history = [  
    ChatMessage(role="system", content="You are a bartender."),  
    ChatMessage(role="user", content="What do I enjoy drinking?"),  
]  
output = llm.chat(chat_history)  
print(output)

(copied from here: https://lmy.medium.com/comparing-langchain-and-llamaindex-with-4-tasks-2970140edf33)
Alternatively, are you open to try out other LLMs? According to this doc:
https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#open-source-llms
llama2-chat-7b 4bit is known to be horrible as agents.
But zephyr-7b-beta seem to be performing well as agents.
zephyr-7b-beta is also the LLM that I'm using personally, so I can say that it is doing decently well.
Add a reply
Sign up and join the conversation on Discord