Agent

At a glance

The post discusses using different language models (LLMs) for different tasks within an agent-based system. The community member is interested in using a coding-focused model like GPT-4 for tool usage and reasoning, but a different model for the final output generation. The comments suggest that open-source LLMs like Llama2 struggle with agent-based tasks, and the community member is experiencing issues with the Llama Index library not recognizing the specified tool. The comments recommend trying newer models like Zephyr or Mistral, and note that using open-source models for agents is still quite challenging and unreliable compared to commercial models like GPT-4.

Useful resources

mmaybe goats dont exist

Plain Text

return OpenAIAgent.from_tools(
        tools=[query_engine_tool],
        llm=get_default_llm(),
        chat_history=history,
    )

If i define the LLM here in the tool, will it use it for the reasoning, but i can use another lllm for the final output?

I find coding models and gpt4 do great at the tool usage and such, but sometimes i want to have the final generation done by a different model

17 comments

LLogan M

The llm defined here is responsible for figuring out which tools to use, interpreting tool output, responding to the user

What llm each tool uses depends on how you defined it. For example, here the query engine tool will use the llm attached to that index

LLogan M

Hope that makes some sense lol

jjadenxxxx

Plain Text

 from llama_index.llms import Replicate

llama2_7b_chat = "meta/llama-2-7b-chat:8e6975e5ed6174911a6ff3d60540dfd4844201974602551e10e9e87ab143d81e"
llm = Replicate(
    model=llama2_7b_chat,
    temperature=0.01,
    additional_kwargs={"top_p": 1, "max_new_tokens": 300},
)


# set tokenizer to match LLM
from llama_index import set_global_tokenizer
from transformers import AutoTokenizer

set_global_tokenizer(AutoTokenizer.from_pretrained("NousResearch/Llama-2-7b-chat-hf").encode)

from llama_index.embeddings import HuggingFaceEmbedding
from llama_index import ServiceContext

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)

from llama_index import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

repair_job_engine = index.as_query_engine(similarity_top_k=3)

from llama_index.tools import QueryEngineTool
from llama_index.tools import ToolMetadata

query_engine_tools = [
    QueryEngineTool(
        query_engine=repair_job_engine,
        metadata=ToolMetadata(
            name="query_specified_repair_jobs",
            description="Provides information about repair jobs in detail "
        )
    )
]


from llama_index.agent import ReActAgent
agent = ReActAgent.from_tools(
    query_engine_tools,
    llm=llm,
    verbose=True,
    # context=context

response = agent.chat("What are the main failure modes in the file? ")
print(str(response))
)

I get:
Thought: I need to use a tool to help me answer the question.
Action: tool
Action Input: {'input': 'hello world', 'num_beams': 5}
KeyError: 'tool'

I think I strictly follow the guide, and try to use agent to answer some questions in the file loaded. I also specify the tool for the ReActAgent, but it seems the chat function doesn't recognize the tool I specified. Can someone help?

LLogan M

open source LLMs are very bad at agentic tasks. Here, you can see it's completely hallucinating the tool name and input

jjadenxxxx

Even Llama2 is almost the best open source llm, it’s not capable of making this work?

LLogan M

llama2 isn't actually that great -- but in any case, yea, agent tasks are very hard for open-source LLMs

Look at the performance gap on benchmarks

Attachment

LLogan M

https://arxiv.org/pdf/2308.03688.pdf

LLogan M

The gap between gpt-4 and everything else is kind of wild

jjadenxxxx

Yes, I do see the gap. And I’m fine on the content quality performance. But correct me if I’m wrong, I think the error I have here is more of a functional bug for llama index agent function to correctly find the existence of the Tool I defined and as the explicit input of that index agent. this is an keyError that as it shows

LLogan M

The key error is because the LLM completely hallucinated. There is no tool called "tool", and the input it generated {'input': 'hello world', 'num_beams': 5} does not seem helpful or correct given the input query

LLogan M

Your tool name was query_specified_repair_jobs

LLogan M

A correct react output might look like

Plain Text

Thought: I need to use a tool to help me answer the question.
Action: query_specified_repair_jobs
Action Input: {'input': 'What are the main failure modes for the repair jobs?'}

jjadenxxxx

Yes, it should look like this

jjadenxxxx

Is anything else I can try? If it doesn’t work this way, how did those people test on the agent performance for the agentBench? This task should be a very common use case for many people who don’t want to use OpenAI

LLogan M

The performance on agent bench is likely as low as it is because of errors just like this.

My suggestion is use a newer model (zephyr, mistral). You can also try changing the react prompts, but tbh it's quite tricky

LLogan M

I've been doing this for months now and trust me, using open source models for agents is just not ready yet. Pretty unreliable right now.

I'd be surprised if anyone is using open-source models for agents in production. And if they are, they are likely fine-tuned by the company and/or using highly custom code

jjadenxxxx

Alright, thank you @Logan M for the good advice! Will play around other models.

Add a reply

Find answers from the community

Agent