Find answers from the community

Updated last year

hello, is there an example, where the

At a glance

The community member is asking for an example of how to integrate an LLM to generate filtering on metadata values, beyond the provided example of manually passing the filter values. The comments suggest using an "auto retriever" and provide links to relevant documentation. However, the community member faces issues with the maximum token limit and asks about supporting node splitting for long text content. The comments discuss chunking the data before indexing, attaching links to the full article in the metadata, and using the BaseToolSpec to define an API that filters articles based on keywords.

Useful resources
hello, is there an example, where the LLM can be integrated to generate the filtering on metadata values? The only example i see is where the user pass the values to the filter eg:
Plain Text
from llama_index.vector_stores.types import ExactMatchFilter, MetadataFilters

filters = MetadataFilters(
    filters=[ExactMatchFilter(key="type", value="fruit")]
)
L
S
15 comments
What you describe is an auto retriever
thanks! i'll give a try
@Logan M is there a way also to support node splitting in case the text is long?
In case which text is long?
the content of a TextNode that i'm idexing
(I'm using a single TextNode for a single blog post)
also when using the filtering technique illustrated in your first link i have some issues:
Plain Text
Got output: Error: Error code: 400 - {'error': {'message': 'max_tokens is too large: 8192. This model supports at most 4096 completion tokens, whereas you provided 8192.', 'type': 'invalid_request_error', 'param': 'max_tokens', 'code': None}}
Should probably be chunking your data before indexing it?
but then will be possible to re-obtain the original article?
If you need the full article, you can attach some link to it in the metadata (i.e. a file path, or a URL)
Or you can chunk the data after retrieving
i'm looking on how to use BaseToolSpec, probably this is the best solution
by defining under the hood an API that filter first the articles
based on some keywrods
Add a reply
Sign up and join the conversation on Discord