Find answers from the community

Updated 2 months ago

What is the best way to filter the results from index.as_retriever().retrieve()?

What is the best way to filter the results from index.as_retriever().retrieve()?
k
C
6 comments
Please stand by @Callam as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
There are several ways to filter the results from index.as_retriever().retrieve() in LlamaIndex. Here are a few examples:

  1. Metadata Filters: You can apply filters to the query engine based on the metadata of your documents. Here is an example of how to do it:
Plain Text
from llama_index.core.vector_stores import ExactMatchFilter, MetadataFilters

# Metadata filter
filters = MetadataFilters(
    filters=[ExactMatchFilter(key="theme", value="Mafia")]
)
retriever = index.as_retriever(filters=filters)

results = retriever.retrieve("What is inception about?")
print_results(results)

(Source: LlamaIndex Documentation ())

  1. Similarity Postprocessor: You can use a SimilarityPostprocessor to filter nodes below a certain similarity score. Here is an example:
Plain Text
from llama_index.core.postprocessor import SimilarityPostprocessor

nodes = index.as_retriever().retrieve("test query str")

# filter nodes below 0.75 similarity score
processor = SimilarityPostprocessor(similarity_cutoff=0.75)
filtered_nodes = processor.postprocess_nodes(nodes)

(Source: LlamaIndex Documentation ())

  1. Qdrant Filters: If you are using Qdrant as your vector store, you can use Qdrant filters. Here is an example:
Plain Text
filters = Filter(
    should=[
        Filter(
            must=[
                FieldCondition(
                    key="fruit",
                    match=MatchValue(value="apple"),
                ),
                FieldCondition(
                    key="city",
                    match=MatchValue(value="Tokyo"),
                ),
            ]
        ),
        Filter(
            must=[
                FieldCondition(
                    key="fruit",
                    match=MatchValue(value="grape"),
                ),
                FieldCondition(
                    key="city",
                    match=MatchValue(value="Toronto"),
                ),
            ]
        ),
    ]
)

retriever = index.as_retriever(vector_store_kwargs={"qdrant_filters": filters})

response = retriever.retrieve("Who makes grapes?")
for node in response:
    print("node", node.score)
    print("node", node.text)
    print("node", node.metadata)

(Source: LlamaIndex Documentation ())

The best way to filter results depends on your specific use case and the nature of your data.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai is there a way to create custom post processors or a way to post process on metadata?
Let me check the knowledge base for you. One moment! :books:
----------------------
Yes, you can create custom postprocessors in LlamaIndex. The base class for creating a custom postprocessor is BaseNodePostprocessor. Here is an example of a simple custom postprocessor:

Plain Text
from llama_index.core import QueryBundle
from llama_index.core.postprocessor.types import BaseNodePostprocessor
from llama_index.core.schema import NodeWithScore

class DummyNodePostprocessor(BaseNodePostprocessor):
    def _postprocess_nodes(
        self, nodes: List[NodeWithScore], query_bundle: Optional[QueryBundle]
    ) -> List[NodeWithScore]:
        # subtracts 1 from the score
        for n in nodes:
            n.score -= 1

        return nodes

(Source: LlamaIndex Documentation ())

For postprocessing based on metadata, you can use the MetadataReplacementPostProcessor. This postprocessor replaces the node content with a field from the node metadata. If the field is not present in the metadata, then the node text remains unchanged. Here is an example of how to use it:

Plain Text
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

postprocessor = MetadataReplacementPostProcessor(
    target_metadata_key="window",
)

postprocessor.postprocess_nodes(nodes)

(Source: LlamaIndex Documentation ())
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord