Find answers from the community

Updated 3 weeks ago

Metadata

Hello everyone, I’m new to using LlamaIndex.

When working with Metadata Extraction, I don’t understand how LlamaIndex uses the extracted metadata (how it is stored and queried to retrieve the information). Does LlamaIndex convert metadata (such as title, summary, Question Answer, Entity) into text, then append it to the original content before embedding, or does it embed the metadata separately?

Additionally, how can I query to retrieve the correct information based on the stored metadata? I would greatly appreciate any help from everyone.
L
h
4 comments
Yea it's stored as text, and both the embedding model and llm see it by default (this is configurable)
Since it's embedded, it will influence retrieval. If you have specific metada, there are metadata filters
Thank you for your answer. Suppose I have documents with a metadata field for the publication year. How can I accurately filter documents published in 2025 with a query like, "Answer the question based on documents published in 2025"? I am aware of Metadata Filtering supported by databases, but I would need to define the filters beforehand in LlamaIndex. My question is: does LlamaIndex have any tool to automatically extract metadata-related information from a query and then apply filtering, or would I need to write a custom tool(like define Function call ) for this purpose ?
Yea that'd be a custom llm/function call to infer that

There is an AutoRetriever in the framework that attempts to automate this for you, but imo you'll have better accuracy (and easier to debug) if you build it around the scope of your documents and llm instead
Add a reply
Sign up and join the conversation on Discord