When using a custom Retriever when are filters run? is ...

At a glance

The community members are discussing the behavior of custom retrievers in the LlamaIndex library. The main points are:

1. Filters are typically applied during the retrieval process when using a custom retriever. The custom retriever can use objects that have already been filtered, but it's not limited to only those objects.

2. The community members discuss the timing of metadata filters, with some suggesting they should be applied before the retrieval step to maintain accuracy, especially in multi-tenant scenarios.

3. There is a discussion around the behavior of the include_text=True option, which seems to reduce the number of nodes and combine them into a single text string.

There is no explicitly marked answer in the provided information.

CCallam

@kapa.ai When using a custom Retriever when are filters run? is the custom retriever only using objects that have already been filtered by the filter?

10 comments

kkapa.ai

Please stand by @Callam as I search the knowledge base. I normally take about 20 seconds or so. :wave:

kkapa.ai

----------------------
In LlamaIndex, when using a custom retriever, filters are typically applied during the retrieval process. The retriever is capable of retrieving and running other objects, including other retrievers, query engines, query pipelines, and other nodes. This means that the custom retriever can use objects that have already been filtered by the filter, but it's not limited to only using those objects.

For example, in the case of the BM25 Retriever (), it extends the base retriever class and creates a custom retriever that always uses the vector retriever and BM25 retriever. Then, nodes can be re-ranked and filtered. This lets us keep intermediate top-k values large and letting the re-ranking filter out un-needed nodes.

In the case of the Elasticsearch Auto Retriever (), metadata filters are inferred during the retrieval process, which helps with more precise retrieval. For instance, when retrieving information about movies directed by Andrei Tarkovsky, the retriever uses the query string "science fiction" and applies a filter for the director "Andrei Tarkovsky".

So, while filters can be used to refine the objects that the custom retriever works with, they are not the only mechanism for controlling the retrieval process. The custom retriever can also use other strategies, such as re-ranking, to further refine the results.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

CCallam

@Logan M Are metadata filters run before or after the retrieval step? if it is after wouldn't the inclusion of more information in a multitenant graph lead to a worst retrieval response?

LLogan M

By "custom retriever" I'm not sure what you mean?

In almost all cases, it should be "before" retrieval (i.e. in a vector search, only performing the search after filtering)

CCallam

like this:

LLogan M

hmmm, there is no filtering in your retriever right now?

CCallam

Yeah I didn't add them directly to the retrievers inside of the custom retriever but thats what I was kinda asking. If they ran before the retrival step?

CCallam

I don't understand how multi-tenant retrival doesn't become less and less accurate unless you filter before the retrieval step.

CCallam

like if you have 3 relavant chunks with (name=chunk_1, score=1, org_id=1), (name=chunk_2, score=0.9, org_id=1), (name=chunk_3, score=.8, org_id=2)

If you only return top 2 results lets say and you filter after retrieval then you would be in a situation where if you are querying from org 2 get the first 2 chunks, then filter them out by org_id and have no context where there was a chunk that was useful

CCallam

Also why does setting include_text=True reduce the number of nodes and like combine nodes into one text string?

Add a reply

Find answers from the community

When using a custom Retriever when are filters run? is the custom retriever only using obj