Hello my team and I are having a bit of

At a glance

The community members are discussing the metadata portion of Documents within the GPT-Index guide. They are trying to understand whether the metadata is embedded along with the original chunk of text, or if it is injected in a different data structure. The comments clarify that the metadata is injected into the text input for both embedding and LLM model calls, so the embedding vector would contain a representation of both the metadata and the content. The community members find this to be a powerful feature, as it allows them to use metadata to bias the embeddings for retrieval, while turning it off from being seen by the LLM.

Useful resources

aairborne

Hello, my team and I are having a bit of trouble understanding the metadata portion of Documents within this guide here: https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_documents.html#metadata

The guide states that By default, the metadata is injected into the text for both embedding and LLM model calls. Does this mean the metadata is also embedded by LlamaIndex along with the original chunk of text? Or is it that the chunk of text is embedded and the metadata is injected in a different data structure?

12 comments

LLogan M

This just means that when generating embeddings or sending text to the LLM, the text input to both will look something like

Plain Text

"{metadata_str}\n\n{content}"

Further down on the page, more details are given on how to customize this

https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_documents.html#advanced-metadata-customization

aairborne

So just to clarify, LlamaIndex does not actually embed the metadata? It only sends it?

aairborne

Thanks for all your help btw

LLogan M

Right, we are just sending text to openai by default, and openai returns the emebdding vector. This uses the text-embedding-ada-002 model from openai

aairborne

Oh so LlamaIndex is sending both the content and the metadata to the embedding model and in return the embedding model returns a vector that contains both the metadata and teh content?

LLogan M

The embedding vector would be a representation of both yes

aairborne

Ah I see

aairborne

interesting okay sounds good! Thanks again

LLogan M

The powerful thing about this is that you can use metadata to bias the embeddings for retrieval, but turn it off from being seen by the LLM 🧠

LLogan M

Also you're welcome! Happy to help

aairborne

Yeah thats what we are seeing with our responses too but wanted to clarify this!

aairborne

Super cool feature

Add a reply

Find answers from the community

Hello my team and I are having a bit of