Find answers from the community

Updated last year

Can I use MetadataFilters with

Can I use MetadataFilters with ExactMatchFilter for multiple values for the same key? I want to filter my index on a set of documents
J
Ł
9 comments
I want to filter on a set of documents like this:
Plain Text
def index_to_query_engine(conversation_docs: List[str], index: VectorStoreIndex) -> BaseQueryEngine:
    doc_ids = [str(doc.id) for doc in conversation_docs]
    filters = MetadataFilters(
        filters=[ExactMatchFilter(key=DB_DOC_ID_KEY, value=str(doc_id)) for doc_id in doc_ids]
    )
    kwargs = {"similarity_top_k": 3, "filters": filters}
    return index.as_query_engine(**kwargs)
This looks like it's setting up the filters correctly:
Plain Text
INFO:app.chat.engine:vector_query_engine_tools is: [
    {
        "_metadata": {
            "description": "...",
            "fn_schema": "...",
            "name": "..."
        },
        "_query_engine": {
            "_node_postprocessors": [],
            "_response_synthesizer": "...",
            "_retriever": "...'_filters': MetadataFilters(filters=[ExactMatchFilter(key='db_document_id', value='0a427515-c6d7-4fe4-9cc2-e6078eb6001b'), ExactMatchFilter(key='db_document_id', value='27b0b587-9629-41e2-a8e5-7281a0e6f300'), ExactMatchFilter(key='db_document_id', value='7a6c32ef-5bc3-44f9-ae24-0b428cc36a00'), ExactMatchFilter(key='db_document_id', value='d93366e0-9362-48ab-89bf-1f986f5f6a9a'), ExactMatchFilter(key='db_document_id', value='eb163ad6-e4aa-4a88-a643-40cc1c4352fb'), ExactMatchFilter(key='db_document_id', value='f356a86b-2365-4cd0-b6b9-d7de13622bbc')]), '_kwargs': {}}",
            "callback_manager": "..."
        }
    }
]
The issue is that it only seems to be grabbing citations/sources/documents for the last value:
Plain Text
INFO:app.schema:QuestionAnswerPair.from_sub_question_answer_pair: citations: [
    {
        "document_id": "f356a86b-2365-4cd0-b6b9-d7de13622bbc",
        "page_number": 1,
        "score": 0.8420158436317318,
        "text": "Underwood, Susan Ardmore..."
    }
]
Yes, afaik llamaindex does not support multiple conditions natively, but you can do this with Pinecone - you just need to supply the condition in json rather than python:

Plain Text
def filter_by_document_id_and_education(document_id):
    return {
        "filter": {
            "$and": [
                {"uuid": str(document_id)},
                {"contains_education": 1},
            ]
        }
    }
In theory some other vector stores also support complex metadata filtering, but only Pinecone has the docs for it
So you could just the mongo $in operator to achieve this for your use case
Thanks for the rec!! I'm using pg vector db and it turns out they have undocumented support for in: https://github.com/langchain-ai/langchain/issues/9726#issuecomment-1705465285

Here's my updated function, which seems to be correctly pulling multiple documents now!
Plain Text
def index_to_query_engine(conversation_docs: List[str], index: VectorStoreIndex) -> BaseQueryEngine:
    doc_ids = [str(doc.id) for doc in conversation_docs]
    filters = {DB_DOC_ID_KEY: {"in": doc_ids}}
    kwargs = {"similarity_top_k": 3, "filter": filters}
    return index.as_query_engine(**kwargs)
Add a reply
Sign up and join the conversation on Discord