Find answers from the community

Updated 2 months ago

Hello my team and I are having a bit of

Hello, my team and I are having a bit of trouble understanding the metadata portion of Documents within this guide here: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_documents.html#metadata

The guide states that By default, the metadata is injected into the text for both embedding and LLM model calls. Does this mean the metadata is also embedded by LlamaIndex along with the original chunk of text? Or is it that the chunk of text is embedded and the metadata is injected in a different data structure?
L
a
12 comments
This just means that when generating embeddings or sending text to the LLM, the text input to both will look something like

Plain Text
"{metadata_str}\n\n{content}"


Further down on the page, more details are given on how to customize this

https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_documents.html#advanced-metadata-customization
So just to clarify, LlamaIndex does not actually embed the metadata? It only sends it?
Thanks for all your help btw
Right, we are just sending text to openai by default, and openai returns the emebdding vector. This uses the text-embedding-ada-002 model from openai
Oh so LlamaIndex is sending both the content and the metadata to the embedding model and in return the embedding model returns a vector that contains both the metadata and teh content?
The embedding vector would be a representation of both yes
interesting okay sounds good! Thanks again
The powerful thing about this is that you can use metadata to bias the embeddings for retrieval, but turn it off from being seen by the LLM ๐Ÿง 
Also you're welcome! Happy to help
Yeah thats what we are seeing with our responses too but wanted to clarify this!
Super cool feature
Add a reply
Sign up and join the conversation on Discord