Agent

At a glance

@Logan M I had create an agent using create_llama_chat_agent,based on ComposableGraph,and I add a GPTSimpleVectorIndex with document to ComposableGraph.Then start chat to this agent.But I found this the agent's answer that were rarely relevant to my document.I want know how to improve the hit rate of index against my documents?

35 comments

LLogan M

Is the agent actually using your index?

LLogan M

Or it's not using it when it should?

QQuentin

Yes,it actually used.But I feel it knows so few about my document.
Whether it's because I used save_to_disk,and load_from_disk with GPTSimpleVectorIndex and ComposableGraph?

QQuentin

Did I use GPTSimpleVectorIndex.save_to_disk save index as json, and then I can query it via load_from_disk() out of the original documents?

LLogan M

Yup that's fine to do.

What kinds of questions are you asking? What do the documents kind of look like?

QQuentin

I built it completely follow this document:
https://gpt-index.readthedocs.io/en/latest/guides/tutorials/building_a_chatbot.html#setting-up-the-chatbot-loop

Then I append document to it to have a test. It's just a normal web page content. I using a web hub to scrape.asking something about this page,The agent seems only could answer the summary about the page. It can't answer about more details of this page content.

LLogan M

I would try testing the index outside of the agent loop, to see whats going on.

You can check the source nodes to each response

response = index.query(...)
print(response.source_nodes)

You may need to adjust some settings like similarity_top_k

QQuentin

OK,I will test using single GPTSimpleVectorIndex,I think it could response better answer.

QQuentin

besides,I pass this config to GraphToolConfig,Is there anything wrong?

postprocessor = EmbeddingRecencyPostprocessor(service_context=service_context_3_5)

# define query configs for graph
query_configs = [
{
"index_struct_type": "simple_dict",
"query_mode": "default",
"query_kwargs": {
"similarity_top_k": 1,
"node_postprocessors": [postprocessor],
# "include_summary": True
},
"query_transform": decompose_transform
},
{
"index_struct_type": "list",
"query_mode": "default",
"query_kwargs": {
"response_mode": "default",
"node_postprocessors": [postprocessor],
"verbose": True
}
},
]

LLogan M

as you add more documents, you might need to increase the top k in your config.

You could try adding this:

Plain Text

 # define query configs for graph
    query_configs = [
        {
            "index_struct_type": "simple_dict",
            "query_mode": "default",
            "query_kwargs": {
                "similarity_top_k": 3,  # <-- increased
                "response_mode": "compact",  # <--- helps response times be quicker, but is optional 
                "node_postprocessors": [postprocessor],
                # "include_summary": True
            },
            "query_transform": decompose_transform
        },
        {
            "index_struct_type": "list",
            "query_mode": "default",
            "query_kwargs": {
                "response_mode": "default",
                "node_postprocessors": [postprocessor],
                "verbose": True
            }
        },
    ]

QQuentin

OK,thanks for your help.I will try more test.

QQuentin

Hello @Logan M . I had found an case about the issue I mentioned.Please help me analyze this problem.
Steps to reproduce：

1.I scraped this page content (https://www.theguardian.com/lifeandstyle/2023/apr/11/the-man-who-walked-around-the-world-tom-turcich-seven-year-search-meaning-of-life) via playwright and readabilityjs.
What is sure is that the body content of this url had been completely scraped.

2.Using GPTSimpleVectorIndex load the documents,then save it to disk via save_to_disk().

3.create ComposableGraph from indices(contains index above).And build agent using create_llama_chat_agent().

4.Ask question to this agent:"What were gill Halldorsson's first impressions of Turcich?",This question had mentioned at paragraph E.
agent output:"I'm sorry, but I still do not have any information on Gill Halldorsson's first impressions of Tom Turcich. Without more context or information about who Gill Halldorsson is and in what context they met Tom Turcich, I am unable to provide a response."

5.I query same question to that GPTSimpleVectorIndex load via load_from_disk().
index output:"Halldorsson's first impressions of Turcich were that he was the most interesting guy in the world and that he was walking across the world. However, his girlfriend sensed that Turcich was a bit lonely or tired of always saying goodbye to people he clicked with. Turcich had embarked on a seven-year journey to walk around the world and had discovered a lot about the world and himself."

QQuentin

6.Finally,I query this question to that saved ComposableGraph.
graph output:"The original answer remains the same as the new context does not provide any additional information about Gill Halldorsson's first impressions of Turcich."
This answer probably because it had already answered by index.
I found each node's output in source_nodes,The right answer that same with index was 4th node of source_nodes.What does this mean? Should I increase the value of similarity_top_k again?

In short, At least this time GPTSimpleVectorIndex outputs the correct answer, but the agent cannot.

QQuentin

Besides,In the response source_nodes of graph,The score of correct answer node was null.

LLogan M

Yea that response is pretty common when using gpt3.5.

OpenAI has changed the model recently and it seems much worse tbh. Working on fixing the prompts inside llama index to better handle this, but it struggles with answer refinement (hence, the weird answer it gives you from the agent/graph)

QQuentin

Is this abnormal? Whether response should come from the node which has the max score of source_nodes?

LLogan M

The answer actually comes from all the nodes listed under source_nodes

The index fetches the top_k according to similarity score, and then uses all top_k nodes to generate the answer.

This is done using a refine process, where if multiple nodes are retrieved (or response mode=compact is used and all the text doesn't fit into one LLM call), the answer is refined across a few LLM calls

How it works is basically it makes the first call to get an initial answer

Then, it sends the query and existing answer, and along with text from the next chunk of text, and asks the model to update its answer with the new text, or repeat its original answer

QQuentin

Logan, Thank you for providing these details, I still need some time to understand and learn these.

QQuentin

Hi @Logan M,Picking up where I left off last time, I found something new.
I had forgot to pass the param query_configs to graph last time,I had upgrade LlamaIndex to latest version(0.5.18) now, then using graph.query() could response correct answer even do not pass param "query_configs",and the answer was succinct and to the point.

but for agent it still can't working preferably.I ask same question to agent repeatedly,It always can't use a tool at first time.Even though it use tool from the second time,it still can't give a good answer,and every time it‘s possible to response different answer when it use tool.

For the same question with last time I asked,the verbose log of agent:
First time->
Thought: Do I need to use a tool? No
AI: I'm sorry, but I don't have enough context to answer your question. Could you please provide more information about who Gill Halldorsson and Tom Turcich are?

Second time->
Thought: Do I need to use a tool? Yes
Action: Vector Index 0
Action Input: Gill Halldorsson's first impressions of Tom TurcichThe new context provides additional information about Tom Turcich's experiences during his seven-year journey to walk around the world. While he faced scary situations, such as encountering tarantulas and snakes while sleeping in the woods in Costa Rica and paying to spend the night indoors in dangerous areas like El Salvador, he also faced more serious threats, such as being held up at knifepoint in Panama City and being detained by plain-clothed military in Turkey. Despite these incidents, Tom met many wonderful people along the way and returned with even more faith in humankind. However, it is unclear if this new context provides any additional information about Gill Halldorsson's first impressions of Tom Turcich. Therefore, the original answer still stands.

Is this because of a problem with the transfer between the graph and the agent? How can I improve it?

LLogan M

So with the agent, whether or not it chooses to use a tool is completely dependent on the description of the tool.

Sometimes you have to get extremely creative and verbose with the tool descriptions

QQuentin

why does the same question it not use tool at first time, but the second time it certain will use a tool?

QQuentin

The description of the tool,Do you mean description of GraphToolConfig or description of IndexToolConfig?Which will have a greater impact?

LLogan M

Both, since they are both tools.

I'm not sure why when asked twice tbh. It's up to the LLM to decide and they aren't always the smartest. The record of the previous question asked might be biasing it

QQuentin

The description of that IndexToolConfig are verbose enough.But for GraphToolConfig it will contains many index.So I have not give it a concrete description about these indices.

QQuentin

Should I force the agent to use tool?If yes, what should I do?

LLogan M

Right, the graph contains many indexes, but there should be an overall reason for picking that tool right?

"Useful for when you need to find information about XX. You should use this tool when the user is asking about topics that relate to XX, like YY, ZZ, and so on"

QQuentin

Does it make sense if I have a graph that contains a lot of different topics?and those different topics are distributed across the index

LLogan M

I think that's fine? The graph should be able to figure out the query hopefully 🙏

QQuentin

Well I will try to description GraphToolConfig,and see how it reacts

QQuentin

Hi,@Logan M Hello, I still have some confusion about the method of creating an agent_chain. Currently, I am using the create_llama_chat_agent function, which requires a toolkit parameter. The toolkit is composed of index_configs and graph_configs.

However, I found that if I use the initialize_agent function to create the agent, I can simply build a tool with just a graph(without index tools) as param. Since the graph itself already contains indexes, is there any advantage to using such a complex toolkit to create the agent?

LLogan M

You can create the agent any way you want tbh. You could use creat llama agent with a single graph tool, or you can use initialize agent and create the custom tool yourself.

Whatever works best for you, there's no single way to do it 👍

QQuentin

Good. I just wanted to confirm whether providing both index_tools and graph_tools to create agent is better for queries than just providing graph_tools?

LLogan M

I think it just depends. Providing the individual indexes let's the agent have more options when responding to queries. But a single graph can also be preferable if it's setup well

QQuentin

Okay, I see.Thank you for always providing help in a timely manner.👍

QQuentin

Hi @Logan M ,I got this error when chat to agent

Entering new AgentExecutor chain...

INFO:openai:error_code=context_length_exceeded error_message="This model's maximum context length is 4097 tokens. However, you requested 4894 tokens (3894 in the messages, 1000 in the completion). Please reduce the length of the messages or completion." error_param=messages error_type=invalid_request_error message='OpenAI API error received' stream_error=False

But I think It have not load tools at this time,agent.memory.chat_memory.messages current is [],
why I got this error,how to reduce tokens of messages?

I use this way to start a agent:
agent = initialize_agent([] if tools is None else tools,
llm_predictor_3_5.llm,
agent="conversational-react-description",
memory=memory,
verbose=True)

my settings:
max_input_size = 4000
num_output = 1000
max_chunk_overlap = 20
chunk_size_limit = 1400

llm_predictor_3_5 = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", max_tokens=num_output))

service_context_3_5 = ServiceContext.from_defaults(llm_predictor=llm_predictor_3_5
, prompt_helper=PromptHelper.from_llm_predictor(llm_predictor_3_5)
, chunk_size_limit=chunk_size_limit
)

Add a reply

Find answers from the community

Agent