Find answers from the community

Updated 2 months ago

Is there an example notebook showcasing

Is there an example notebook showcasing the use of approximate. meta data filtering ? For example, I am using workflows for RAG and I'd like to include. approximate metadata filtering for better retrieval accuracy.

Plain Text
custom_index = VectorStoreIndex.from_documents(
                                               documents,
                                               storage_context=storage_context        
                                              )


class RAGWorkflow(Workflow):
    
    @step
    async def ingest(self, ctx: Context, ev: StartEvent) -> StopEvent | None:
        """Entry point - ingest documents"""
        
        index = custom_index
        
        return StopEvent(result=index)

    @step
    async def retrieve(self, ctx: Context, ev: StartEvent) -> RetrieverEvent | None:
        "Entry point for RAG, triggered by a StartEvent with `query`."
        query = ev.get("query")
        index = ev.get("index")

        if not query:
            return None

        # store the query in the global context
        await ctx.set("query", query)
        await ctx.set("index", index)

        # get the index from the global context
        if index is None:
            print("Index is empty, load some documents before querying!")
            return None

        retriever = index.as_retriever(similarity_top_k=10)
        nodes = await retriever.aretrieve(query)
    
        return RetrieverEvent(nodes=nodes)
L
k
7 comments
What is approximate metadata filtering?

Sounds like something that not every vector store would support?
well, when generating documents for ingestion, I specified the metadata :

Plain Text
documents = [
                Document(
                    text=f"Scope: {row['Scope']}, Level_1: {row['Level 1']}, Level_2: {row['Level 2']}, Level_3: {row['Level 3']}",
                    metadata={"scope": row['Scope']}
                ) for _, row in df.iterrows()
            ]


so, when retrieving I'd like to map a user query to metadata filters and then do the retrieval. The filters may not be exact tho.
I am using MilvusVectorStore
they kind of have to be exact for metadata filering to work? I don't think milvus supports "approximate" filters

You'd need a step to get the llm to write the filters I think
I see.

In my case, I'd like to dynamically construct exact metadata filters based on user query, is there an example notebook you can point me to?
In the code below, the filters are predefined and I'd like to construct them based on user query.

Plain Text
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)

filters = MetadataFilters(
    filters=[
        MetadataFilter(key="theme", operator=FilterOperator.EQ, value="Mafia"),
    ]
)


https://docs.llamaindex.ai/en/stable/examples/vector_stores/Qdrant_metadata_filter/
I would define a pydantic object and get the llm to predict it, and then translate that into filters

This way, you can limit the scope of the filters to only things that you know are filterable
Add a reply
Sign up and join the conversation on Discord