Agent cant access data in query_engine

mmaru

I posted my issue to the wrong thread yesterday. I'm sorry I hogged that thread. I've just noticed what I did, so I deleted it.
The thing is, my agent says it doesn't have access to the dataset. I created query_engine_tools, and when I iterate over them with a query, I get correct answers about the dataset. However, when I use them for an agent, LLM says it doesn't have access to the dataset. I've been trying to fix my code for 2 days and I'm out of ideas. I hope someone had this same problem and found a solution, because I haven't found it anywhere.

6 comments

mmaru

This is my code, and I'm using Azure OpenAI llm and embeddings:

Plain Text

import pandas as pd
from llama_index.core import Document, VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.agent import AgentRunner
from llama_index.agent.openai import OpenAIAgentWorker
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.callbacks import CallbackManager

df = pd.read_csv("/Files/email_data_partial.csv")

callback_manager = CallbackManager([])
node_parser = SentenceSplitter()

query_engine_tools = []

for index, row in df.iterrows():
    email_text = f"Subject: {row['Subject']}\nFrom: {row['From']}\nTo: {row['To']}\nDate: {row['Date']}\n\n{row['Body']}"
    doc = Document(text=email_text, metadata={
        "message_id": row["Message-ID"],
        "date": row["Date"],
        "sender": row["From"],
        "recipient": row["To"],
        "subject": row["Subject"]
    })

    node = node_parser.get_nodes_from_documents([doc])
    content_index = VectorStoreIndex(node, callback_manager=callback_manager)
    content_index.storage_context.persist(persist_dir=f"./data/email_{index}")
    content_query_engine = content_index.as_query_engine(llm=llm)

    tool_metadata = ToolMetadata(
        name=f"vector_tool_{doc.metadata['message_id']}",
        description="Useful for analyzing specific aspects of this email."
    )

    query_engine_tools.append(
        QueryEngineTool(
            query_engine=content_query_engine,
            metadata=tool_metadata
        )
    )
email_analysis_agent = OpenAIAgentWorker.from_tools(
    query_engine_tools, llm=llm, verbose=True
)

mmaru

I get the correct response with this:

Plain Text

for tool in query_engine_tools:
    test_response = tool.query_engine.query("Summarize the email")
    print(test_response)

But when I create an agent I get nothing:

Plain Text

agent_runner = AgentRunner(email_analysis_agent)
response = agent.chat("Who is the most frequent sender and who is the most frequent recipient?")
print(response)

This returns: Added user message to memory: Who is the most frequent sender and who is the most frequent recipient?
To determine the most frequent sender and recipient, we would need specific information about the context or dataset being referred to. Without any additional details, it is not possible to provide an accurate answer.

mmaru

I used this notebook as an inspiration, but when I tried to ask the questions about the cities not in the list, the agents replied correctly.
https://github.com/run-llama/llama_index/blob/f470398e99a97293bd242bb7ea6083d326484f6e/docs/examples/agent/agent_runner/agent_runner_rag.ipynb#L232

LLogan M

It seems to me you are giving each tool the same description -- each description should be unique, this is what the agent looks at when picking a tool

mmaru

I changed the description to this: tool_metadata = ToolMetadata(
name=f"vectortool{email_metadata['message_id']}",
description=f"Useful for analyzing specific aspects of this email with the id: {email_metadata['message_id']}."
)
No matter what I do, I get this answer: I apologize, but as an AI language model, I do not have access to external datasets or the ability to browse the internet. Therefore, I cannot read or summarize specific emails from Mark Sagel to John Arnold. However, if you provide me with the content or specific details of the emails, I would be happy to help you summarize them.

The same happens with this notebook: https://github.com/run-llama/llama_index/blob/f470398e99a97293bd242bb7ea6083d326484f6e/docs/examples/agent/agent_runner/agent_runner_rag.ipynb#L232

Strangely, I had problems with this as well. The LLM simply made up an answer (all calculations were wrong) and didn't use the tools at all: https://docs.llamaindex.ai/en/stable/examples/agent/openai_agent.html

Is Azure OpenAI an issue? I've checked the source code and I can't find anything that would cause problems.

mmaru

My guess is the problem is Azure OpenAI. Many users are/were reporting problems when using it. I tried implementing function calling from scratch and I encountered the same problem. tools/functions are not accessed.

Add a reply

Find answers from the community

Agent cant access data in query_engine_tools