Find answers from the community

Updated 6 months ago

llama_index/docs/examples/metadata_extra...

At a glance

The community member is reading a document on MetadataExtraction and using the QuestionsAnsweredExtractor to generate questions that the excerpt can answer. They are wondering how the vector search knows to search the "questions_this_excerpt_can_answer" metadata without specifying it as a metafilter.

The comments explain that when nodes are embedded by llama-index, the metadata is included in the embedding. The community members can configure which metadata keys to include or exclude. They can also print the node's content with the metadata mode set to EMBED to see an example.

The community members conclude that the node/document object is highly customizable in this aspect, and they now understand that metadata is part of the embedded search.

Useful resources
I'm reading this doc on MetadataExtraction.
and with using the QuestionsAnsweredExtractor(questions=3, llm=llm), I see that it generates
Plain Text
 'questions_this_excerpt_can_answer': '1. How many countries does Uber operate in?\n2. What is the total gross bookings of Uber in 2019?\n3. How many trips did Uber facilitate in 2019?'}


My understanding is when doing a Vector Search a Document/Node's Text is searched. So how does it know how to search questions_this_excerpt_can_answer without specifying it as a metafilter using qdrant for example.

https://github.com/run-llama/llama_index/blob/main/docs/examples/metadata_extraction/MetadataExtractionSEC.ipynb
L
c
5 comments
When nodes are embedded by llama-index, the metadata is included in the embedding

It's something like embed_model.get_text_embedding(node.get_content(metadata_mode=MetadataMode.EMBED))
You can actually configure which metadata keys to include/exclude too
You can try see an example now actually
Plain Text
from llama_index.schema import MetadataMode

print(node.get_content(metadata_mode=MetadataMode.EMBED))
the node/document object is HIGHLY customizable in this aspect
ahh I see. I thought metadata were not part of the embedded search. good to know thanks.
Add a reply
Sign up and join the conversation on Discord