Find answers from the community

Updated 3 months ago

Hey, Im trying ReAct agent + gpt-4 vs

Hey, Im trying ReAct agent + gpt-4 vs llama2-7b. Both are the same code, only difference is the llm, any idea why llama2 is going on an endless loop instead of reasoning like gpt?
Attachments
image.png
image.png
V
T
8 comments
On the Llama side, was that one generation, or actually multiple times of using tools?
If all of those pink lines came from just one pass of text generation, then it it because Llama is biased to generate long output, in sacrifice of precision. (Metaphorically, you can say that Llama isn't condifent enough to stop talking when it had already said enough; much like a person with low self-esteem.)
If those pink lines actually came from multiple turns of using tools, then it might be because of the same reason underlying my question here: https://discord.com/channels/1059199217496772688/1198847208234160228
Another thing that caught my eye was the user: and assistant: prefixes in your Llama log.
Isn't Llama using [INST] and [/INST] instead? Check out the "Meta Llama 2 Chat" preset I attached below:
Attachment
image.png
this is my setup, the problem is it is not using any tools and its reasoning is also very questionable, am I doing something wrong or llama just isnt that good of an agent yet?
Attachment
image.png
I myself use OpenAILike instead of HuggingFaceLLM, so I actually don't know how most of the parameters you gave there would affect the quality.

For example, I don't know whether HuggingFaceLLM would take care of the prompt wrapping for you or not, so there is a chance that your query_wrapper_prompt is doing more detrimental work than it is helpful.

Since you are running models locally anyway, would you like to try using OpenAiLike and use LM Studio to serve the LLM?

Here's how:

Plain Text
from llama_index.llms import ChatMessage, OpenAILike  
  
llm = OpenAILike(  
    api_base="http://localhost:1234/v1",  
    timeout=600,  # secs  
    api_key="loremIpsum",  
    is_chat_model=True,  
    context_window=32768,  
)  
chat_history = [  
    ChatMessage(role="system", content="You are a bartender."),  
    ChatMessage(role="user", content="What do I enjoy drinking?"),  
]  
output = llm.chat(chat_history)  
print(output)

(copied from here: https://lmy.medium.com/comparing-langchain-and-llamaindex-with-4-tasks-2970140edf33)
Alternatively, are you open to try out other LLMs? According to this doc:
https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#open-source-llms
llama2-chat-7b 4bit is known to be horrible as agents.
But zephyr-7b-beta seem to be performing well as agents.
zephyr-7b-beta is also the LLM that I'm using personally, so I can say that it is doing decently well.
Add a reply
Sign up and join the conversation on Discord