The post asks whether the provided code, which uses the embed_model.get_text_embedding() function with the metadata_mode="all" parameter, also embeds the metadata in the resulting vector. The comments discuss this question, with one community member confirming that the metadata is included, and others expressing concerns that this could skew the vector space towards irrelevant data. Some community members suggest that metadata can be helpful in improving embeddings, while others recommend selectively including only relevant metadata fields.
node_embedding = embed_model.get_text_embedding(
node.get_content(metadata_mode="all")
) Does this code also embed the metdata in the resulting vector?
I find it a bit weird that they recommend that in the RAG from scratch page. It would skew the vector space towards context irrelevant data. Shouldn’t it be better to embed only what is context relevant?
source_doc for example, in most cases it wouldnt improve the embedding. Ideally one would want to only select the metadata fields that are relvant, not all. I'm building the node content manually because of that