Question about retrievers and metadata

At a glance

The community member is having an issue with their retriever and metadata filters. They are trying to use metadata filters to get the correct nodes with a query engine, but the results are not being filtered correctly. The community member has provided some code they have tried, which includes using a VectorIndexRetriever and a RetrieverQueryEngine.

In the comments, another community member suggests that the correct filter option is filters=filters, not metadata_filters. The original community member confirms that this seems to have worked for them.

Additionally, the community member mentions that they got the code snippets from the Llama Index chatbot, which they say seems to provide wrong code examples a lot. They offer to volunteer to help improve the chatbot's responses.

However, another community member clarifies that the Llama Index chatbot is not maintained by them, and the chatbots on Discord and the documentation are made by different organizations.

Useful resources

ttheta

Question about retrievers and metadata filters. I'm trying to use metadata filters to get the correct nodes with a query engine because I'm finding the results are wrong a lot. All of my source documents have a metadata key "source" that contains the URL to the document. I tried the following code to implement it in conjunction with a RetrieverQueryEngine but the results don't appear filtered because I'm still getting back nodes from the wrong documents. Can someone let me know if the code is implemented correctly?

filters = MetadataFilters(filters=[ExactMatchFilter(key="source", value="https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-4351")])
retriever = VectorIndexRetriever(
    index=vector_store_indicies['msrc_security_update'],
    similarity_top_k=5,
    metadata_filters=filters
)
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[metadata_replace]
)
response = query_engine.query(
    "fully explain with details 'CVE-2023-4351'",
)

8 comments

ttheta

as a quick update, if I query the chromadb collection directly with the cve string, I get back all the correct nodes:

data = collection.query(query_texts = 'Chromium: CVE-2023-4351 Use after free in Network', 
                               n_results=5, 
                               where_document={'$contains': 'CVE-2023-4351'}, 
                               include=['metadatas', 'distances'])

ttheta

so Im wondering if I'm misusing or doing something wrong on the Llama Index side of things? Chroma has the correct node data, but for some reason the Llama Index query_engine isn't returning the correct nodes

LLogan M

I think the correct filter option is filters=filters, not metadata_filters

ttheta

@Logan M thanks so much, that seems to have worked. As an FYI, I got the code snippets from the Llama Index chatbot...it seems to give wrong code examples a lot. Although I'm a total newbie, I would be happy to volunteer to do some legwork for the chatbot if it'll help improve its responses.

LLogan M

lol we actually dont maintain that chatbot. The one on discord is made by kapa.ai, the one on the docs is made by mendable

ttheta

lol...alrighty then

ttheta

lmao

LLogan M

😆

Add a reply

Find answers from the community

Question about retrievers and metadata