Hey, Im trying ReAct agent + gpt-4 vs

At a glance

The community member is trying to use the ReAct agent with GPT-4 and Llama2-7b, and they are noticing that Llama2 is going into an endless loop instead of reasoning like GPT-4. The comments suggest that this could be due to Llama2 being biased towards generating long output at the expense of precision, or it could be related to how the tools are being used. The community members also discuss potential issues with the setup, such as the use of prefixes and the prompt wrapping. They suggest trying other LLMs like zephyr-7b-beta, which seems to perform better as an agent according to the documentation.

Useful resources

TTungdepzai

Hey, Im trying ReAct agent + gpt-4 vs llama2-7b. Both are the same code, only difference is the llm, any idea why llama2 is going on an endless loop instead of reasoning like gpt?

Attachments

8 comments

VVicent W.

On the Llama side, was that one generation, or actually multiple times of using tools?

VVicent W.

If all of those pink lines came from just one pass of text generation, then it it because Llama is biased to generate long output, in sacrifice of precision. (Metaphorically, you can say that Llama isn't condifent enough to stop talking when it had already said enough; much like a person with low self-esteem.)

VVicent W.

If those pink lines actually came from multiple turns of using tools, then it might be because of the same reason underlying my question here: https://discord.com/channels/1059199217496772688/1198847208234160228

VVicent W.

Another thing that caught my eye was the user: and assistant: prefixes in your Llama log.
Isn't Llama using [INST] and [/INST] instead? Check out the "Meta Llama 2 Chat" preset I attached below:

Attachment

TTungdepzai

this is my setup, the problem is it is not using any tools and its reasoning is also very questionable, am I doing something wrong or llama just isnt that good of an agent yet?

Attachment

VVicent W.

I myself use OpenAILike instead of HuggingFaceLLM, so I actually don't know how most of the parameters you gave there would affect the quality.

For example, I don't know whether HuggingFaceLLM would take care of the prompt wrapping for you or not, so there is a chance that your query_wrapper_prompt is doing more detrimental work than it is helpful.

Since you are running models locally anyway, would you like to try using OpenAiLike and use LM Studio to serve the LLM?

Here's how:

Plain Text

from llama_index.llms import ChatMessage, OpenAILike  
  
llm = OpenAILike(  
    api_base="http://localhost:1234/v1",  
    timeout=600,  # secs  
    api_key="loremIpsum",  
    is_chat_model=True,  
    context_window=32768,  
)  
chat_history = [  
    ChatMessage(role="system", content="You are a bartender."),  
    ChatMessage(role="user", content="What do I enjoy drinking?"),  
]  
output = llm.chat(chat_history)  
print(output)

(copied from here: https://lmy.medium.com/comparing-langchain-and-llamaindex-with-4-tasks-2970140edf33)

VVicent W.

Alternatively, are you open to try out other LLMs? According to this doc:
https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#open-source-llms
llama2-chat-7b 4bit is known to be horrible as agents.
But zephyr-7b-beta seem to be performing well as agents.

VVicent W.

zephyr-7b-beta is also the LLM that I'm using personally, so I can say that it is doing decently well.

Add a reply

Find answers from the community

Hey, Im trying ReAct agent + gpt-4 vs