When working with Metadata Extraction, I don’t understand how LlamaIndex uses the extracted metadata (how it is stored and queried to retrieve the information). Does LlamaIndex convert metadata (such as title, summary, Question Answer, Entity) into text, then append it to the original content before embedding, or does it embed the metadata separately?
Additionally, how can I query to retrieve the correct information based on the stored metadata? I would greatly appreciate any help from everyone.
Thank you for your answer. Suppose I have documents with a metadata field for the publication year. How can I accurately filter documents published in 2025 with a query like, "Answer the question based on documents published in 2025"? I am aware of Metadata Filtering supported by databases, but I would need to define the filters beforehand in LlamaIndex. My question is: does LlamaIndex have any tool to automatically extract metadata-related information from a query and then apply filtering, or would I need to write a custom tool(like define Function call ) for this purpose ?
Yea that'd be a custom llm/function call to infer that
There is an AutoRetriever in the framework that attempts to automate this for you, but imo you'll have better accuracy (and easier to debug) if you build it around the scope of your documents and llm instead