Question about retrievers and metadata filters. I'm trying to use metadata filters to get the correct nodes with a query engine because I'm finding the results are wrong a lot. All of my source documents have a metadata key "source" that contains the URL to the document. I tried the following code to implement it in conjunction with a RetrieverQueryEngine but the results don't appear filtered because I'm still getting back nodes from the wrong documents. Can someone let me know if the code is implemented correctly? filters = MetadataFilters(filters=[ExactMatchFilter(key="source", value="https://msrc.microsoft.com/update-guide/vulnerability/CVE-2023-4351")])
retriever = VectorIndexRetriever(
index=vector_store_indicies['msrc_security_update'],
similarity_top_k=5,
metadata_filters=filters
)
query_engine = RetrieverQueryEngine(
retriever=retriever,
node_postprocessors=[metadata_replace]
)
response = query_engine.query(
"fully explain with details 'CVE-2023-4351'",
)
as a quick update, if I query the chromadb collection directly with the cve string, I get back all the correct nodes: data = collection.query(query_texts = 'Chromium: CVE-2023-4351 Use after free in Network',
n_results=5,
where_document={'$contains': 'CVE-2023-4351'},
include=['metadatas', 'distances'])
so Im wondering if I'm misusing or doing something wrong on the Llama Index side of things? Chroma has the correct node data, but for some reason the Llama Index query_engine isn't returning the correct nodes
@Logan M thanks so much, that seems to have worked. As an FYI, I got the code snippets from the Llama Index chatbot...it seems to give wrong code examples a lot. Although I'm a total newbie, I would be happy to volunteer to do some legwork for the chatbot if it'll help improve its responses.