Find answers from the community

Updated 5 months ago

Metadata

At a glance

The community member is trying to use metadata filtering with both ChromaDB and LlamaIndex, but is having trouble with the LlamaIndex implementation. They can constrain the search in ChromaDB using a list of file names, but can't figure out how to do the same in LlamaIndex.

Other community members suggest that LlamaIndex has updated its metadata filtering capabilities, including new operators like GT, LT, IN, and NIN. However, they note that the "value" parameter may not support a list, unlike ChromaDB. One community member mentions seeing a related PR, but hasn't tried it themselves.

After some back-and-forth, a community member reports that they were able to work around the issue by using an OR condition to simulate the IN operator, which seems to have resolved the problem.

Useful resources
I've looked through previous questions around metadata searching, but can't seem to find a decent source to help me get what I need. I'm currently splitting my searching through a direct chromadb search, and a llamaindex search. If the user only want's the actual chunk source, then I use chromadb directly. If they want an interpretation through an LLM narrator, then I pass it to llamaindex.

I can constrain which specific source files are searched with chromadb using the following

print(f"****************** Only searching throgh documents {resourceArray}") results = collection.query( query_texts=[queryText], where = { "file_name": { "$in": resourceArray } }, include=["metadatas", "documents", "distances"], n_results=3, )

In this case, resourceArray simply holds the original file names for the content that was ingested.

I can't work out how to do this with llamaindex. I've see this source example, but can't work out how to use a list filter along the same lines as chromadb.

query_engine = index.as_query_engine( service_context=service_context, similarity_top_k=3, vector_store_query_mode="default", filters=MetadataFilters( filters=[ ExactMatchFilter(key="name", value="paul graham"), ] ), alpha=None, doc_ids=None, )

Have I missed something?
L
j
6 comments
Our metadata filtering stuff has been updated slightly

There are added operators, including GT, LT, IN, NIN, etc.

I don't think the "value" can be a list though. I think chroma is the only one supporting this syntax πŸ€” I remember seeing a PR related to this, so maybe it works? I haven't tried yet

https://docs.llamaindex.ai/en/stable/examples/vector_stores/chroma_metadata_filter.html#multiple-metadata-filters-with-or-condition
Thanks @Logan M - just replying in the github list for visibility. almost there, but there is an attribute issue with IN
Whats the attribute error?
Now working - used OR to simulate an IN
I think maybe you just needed to update your llama-index version? But glad it seemed to work!
Add a reply
Sign up and join the conversation on Discord