Find answers from the community

Updated 4 months ago

What is the best way to filter the results from index.as_retriever().retrieve()?

At a glance
The community members discuss several ways to filter the results from index.as_retriever().retrieve() in LlamaIndex:

1. Metadata Filters: Applying filters to the query engine based on the metadata of the documents.

2. Similarity Postprocessor: Filtering nodes below a certain similarity score.

3. Qdrant Filters: Using Qdrant filters if using Qdrant as the vector store.

The community members also discuss how to create custom postprocessors and postprocess based on metadata in LlamaIndex.

What is the best way to filter the results from index.as_retriever().retrieve()?
k
C
6 comments
Please stand by @Callam as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
There are several ways to filter the results from index.as_retriever().retrieve() in LlamaIndex. Here are a few examples:

  1. Metadata Filters: You can apply filters to the query engine based on the metadata of your documents. Here is an example of how to do it:
Plain Text
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

# Metadata filter
filters = MetadataFilters(
    filters=[ExactMatchFilter(key="theme", value="Mafia")]
)
retriever = index.as_retriever(filters=filters)

results = retriever.retrieve("What is inception about?")
print_results(results)

(Source: LlamaIndex Documentation ())

  1. Similarity Postprocessor: You can use a SimilarityPostprocessor to filter nodes below a certain similarity score. Here is an example:
Plain Text
from llama_index.core.postprocessor import SimilarityPostprocessor

nodes = index.as_retriever().retrieve("test query str")

# filter nodes below 0.75 similarity score
processor = SimilarityPostprocessor(similarity_cutoff=0.75)
filtered_nodes = processor.postprocess_nodes(nodes)

(Source: LlamaIndex Documentation ())

  1. Qdrant Filters: If you are using Qdrant as your vector store, you can use Qdrant filters. Here is an example:
Plain Text
filters = Filter(
    should=[
        Filter(
            must=[
                FieldCondition(
                    key="fruit",
                    match=MatchValue(value="apple"),
                ),
                FieldCondition(
                    key="city",
                    match=MatchValue(value="Tokyo"),
                ),
            ]
        ),
        Filter(
            must=[
                FieldCondition(
                    key="fruit",
                    match=MatchValue(value="grape"),
                ),
                FieldCondition(
                    key="city",
                    match=MatchValue(value="Toronto"),
                ),
            ]
        ),
    ]
)

retriever = index.as_retriever(vector_store_kwargs={"qdrant_filters": filters})

response = retriever.retrieve("Who makes grapes?")
for node in response:
    print("node", node.score)
    print("node", node.text)
    print("node", node.metadata)

(Source: LlamaIndex Documentation ())

The best way to filter results depends on your specific use case and the nature of your data.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai is there a way to create custom post processors or a way to post process on metadata?
Let me check the knowledge base for you. One moment! :books:
----------------------
Yes, you can create custom postprocessors in LlamaIndex. The base class for creating a custom postprocessor is BaseNodePostprocessor. Here is an example of a simple custom postprocessor:

Plain Text
from llama_index.core import QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.schema import NodeWithScore

class DummyNodePostprocessor(BaseNodePostprocessor):
    def _postprocess_nodes(
        self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle]
    ) -> List[NodeWithScore]:
        # subtracts 1 from the score
        for n in nodes:
            n.score -= 1

        return nodes

(Source: LlamaIndex Documentation ())

For postprocessing based on metadata, you can use the MetadataReplacementPostProcessor. This postprocessor replaces the node content with a field from the node metadata. If the field is not present in the metadata, then the node text remains unchanged. Here is an example of how to use it:

Plain Text
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

postprocessor = MetadataReplacementPostProcessor(
    target_metadata_key="window",
)

postprocessor.postprocess_nodes(nodes)

(Source: LlamaIndex Documentation ())
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord