The nodes will be in response.source_nodes no?
@Logan M . Thanks for your suggestion. Unfortunately, response.source_nodes is just an empty list, even though response.response has text from the node returned by th retriever function.
That means the agent didn't use your index/tool π
@Logan M The tool was definitely used. Here is a snippet of the relevant code and the output.
### build an agent for each document using a vectorstore index and a summary index for each document,
### create nd-> agent mapping
obj_node_mapping = SimpleObjectNodeMapping.from_objects(agents_dict.values())
### add metadata to nodes
nodes = []
for req_num,agent in agents_dict.items():
nd = obj_node_mapping.to_node(agent)
nd.text = extra_info_dict[req_num]["summary"]
nd.metadata = {
'name' : f"request_{req_num}",
'request_number': f"{req_num}",
}
nodes.append(nd)
### create index from nodes
object_index = VectorStoreIndex(nodes=nodes)
### define function for filtering using node metadata
def filter_retrieve_fn(
query : str ,
filter_key_list : List[str],
filter_value_list: List[str]
):
query = query or "Query"
exact_match_filters = [
ExactMatchFilter(key=k, value=v)
for k, v in zip(filter_key_list, filter_value_list)
]
retriever = VectorIndexRetriever(
object_index,
filters= MetadataFilters(filters=exact_match_filters),
top_k = top_k,
)
nodes = retriever.retrieve(query)
print(f'number of nodes: {len(nodes)}')
print([nd.score for nd in nodes])
return nodes
### wrap function in a tool
filter_retrieve_tool = FunctionTool.from_defaults(
fn = filter_retrieve_fn,
name = "document_autoretriever",
description = "",
fn_schema = AutoRetrieveModel,
return_direct = True
)
### create agent using filtering function tool
agent = OpenAIAgent.from_tools(
tools = [filter_retrieve_tool],
llm = agent_llm,
verbose = True
)
response = agent.query("Give me a summary of request number 203?")
print(f'response source nodes: {response.source_nodes}')
Here is the output:
Added user message to memory: Give me a summary of request number 203?
=== Calling Function ===
Calling function: document_autoretriever with args: {"query":"summary of request number 203","filter_key_list":["request_number"],"filter_value_list":["203"]}
number of nodes: 1
[0.4076740694348696]
Got output: [NodeWithScore(node=TextNode(id_='-4923860921409334662', embedding=None, metadata={'name': 'request_203', 'request_number': '203'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text='Summary of Request 203:', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n'), score=0.4076740694348696)]
========================
response source nodes: []
@Logan M It looks like there is no current way to do this. Let me share with you the issue I was having when I tried using a VectorIndexAutoRetriever. If the query required a simple filter (e.g filter operator EQ) it worked fine, but if the filter operation was IN it would throw an error:
vector_node_auto_retriever = VectorIndexAutoRetriever(
index = object_index,
vector_store_info = agent_vector_store_info,
llm = agent_llm,
verbose = True,
)
vector_node_auto_retriever.retrieve("What common solutions are proposed in requests 203 and 237?")
Would give the error:
TypeError: 'in <string>' requires string as left operand, not list
ohhh wait, looking at this now, source nodes would of course be empty, because your tool is returning nodes themselves, rather than running a query engine
try response.sources, this has sources for all tool calls
As for the filtering thing, filtering can be wacky for some vector stores
@Logan M response doesn't have any attribute called sources; I only see the attributes 'response', 'source_nodes', and 'metadata'.
once again, I am blind -- probably because you are using agent.query() instead of agent.chat() ?
@Logan M Both sources and source_nodes are empty when using agent.chat. By the way, I really appreciate all your help!
Hereβs the big picture issue Iβm facing: I want to build multi-document agents. I started with your tutorial notebook for multi-document agents, which outlines the following workflow:
->Index each document.
->Build a query engine tool for each document index.
->Wrap an agent around each document query engine tool.
->Wrap a query engine tool around each agent, with metadata fields name and description.
->Make an ObjectIndex and node retriever using all the agent query engine tools.
->Wrap the node retriever in a CustomRetriever class.
->Wrap the CustomRetriever with the final agent.
After this complicated setup, at query time, the agent uses the name and description fields of each document's agent query engine to retrieve the relevant agent(s). This doesnβt work well except for naive queries.
I am experimenting with autoretieval to get more fine-grained control (e.g., retrieve using document names, authors, etc.). Even if I get autoretieval to work (which I havenβt), I have a feeling I wonβt be happy with the performance.
Do you have any advice on what approach to try? Iβm thinking of using a custom similarity score with Weaviate or Elasticsearch (because I don't think it's possible with just LlamaIndex), where I can have a weighted score based on similarity with metadata such as document name, author, and similarity with the text. Do you think this is a good idea?
Thanks again for all your help.
@Logan M I would appreciate any advice you can give me on what approach to take to make a good Mult document agent. Sorry to keep bothering you; I've just spent a lot of time tying the suggested ways and they are not working well. Thanks again.
Sorry, this is a huge wall of text and I just forgot about it lmao
Its best to debug this from the bottom up
- do my query engines work well? If not, do I need better ingestion? Reranking?
- is a single agent able to work well and pick the correct tool? If not, do I need better tool names/descriptions?
- am I able to retrieve the relevant agent using an object retirever? If not, do I need better retrieval? A reranker? Other postprocessors? Better agent names/descriptions?
- is my top level agent able to select the correct agent from the list of retrieved objects/agents. If not, do I need better agent names/descriptions?
@Logan M sorry for the wall of text π In short the failure point is the object retirever, because queries often include document name or author, and the retriever doesn't do well with that. Thats whay I'm trying metadata filters with auto-retrievers, which almost woks but I cant get the agent to return the nodes. So, what do I try next?
Interesting π€ Have you tried reranking yet?
Setting the top-k to like 20, but then reranking and returning like the top-2 or whatever from there?
Auto retrieval is flakey I agree
@Logan M Well the problem with the VectorIndexAutoRetriever is that it is often throwing errors when it has to do IN or CONTAINS operations. That is why I made my own auto retriever an OpenAIAgent and a filtering functions as the tool. That approach does much better, but as we discussed in the beginning of this thread, I can't get the agent to return nodes (or I can't get nodes from the response object)...
In
filter_retrieve_fn()
, change the return type, something like
from llama_index.core.base.response.schema import Response
def filter_retrieve_fn():
...
return Response(response="\n----\n".join([x.text for x in nodes]), source_nodes=nodes)
Then it will get picked up automatically
@Logan M That did the trick. Thank you!!
Okay, here's one last question on this thread π I still feel like I am going to want more and more fine-grained control over what is returned. I am thinking of delving into defining a custom similarity score with Weaviate or Elasticsearch. Any thoughts? Are there any examples of doing this?