Find answers from the community

Updated last year

Text

At a glance

The post asks whether the provided code, which uses the embed_model.get_text_embedding() function with the metadata_mode="all" parameter, also embeds the metadata in the resulting vector. The comments discuss this question, with one community member confirming that the metadata is included, and others expressing concerns that this could skew the vector space towards irrelevant data. Some community members suggest that metadata can be helpful in improving embeddings, while others recommend selectively including only relevant metadata fields.

node_embedding = embed_model.get_text_embedding( node.get_content(metadata_mode="all") )
Does this code also embed the metdata in the resulting vector?
L
B
6 comments
Yea it does. Try printing the output of the get content function to see what it's putting in
I find it a bit weird that they recommend that in the RAG from scratch page. It would skew the vector space towards context irrelevant data. Shouldn’t it be better to embed only what is context relevant?
Usually people put helpful stuff in their metadata actually. I feel like in most cases it improves embeddings
source_doc for example, in most cases it wouldnt improve the embedding. Ideally one would want to only select the metadata fields that are relvant, not all. I'm building the node content manually because of that
You can set which metadata keys to include for both embeddings and for sending to the LLM

node.excluded_embed_metadata_keys = ["key1", ...]
node.excluded_llm_metadata_keys = ["key1", ...]
Add a reply
Sign up and join the conversation on Discord