Find answers from the community

Updated last year

Embeddings

At a glance

The community member is trying to use a mix of Hugging Face APIs and LlamaIndex functions to build a Streamlit-hosted RAG. They are having trouble understanding the extra processing steps in the get_query_embedding step, specifically the normalization after getting the embeddings from the Hugging Face model. The comments suggest that some embedding models have separate "instructions" for embedding queries vs. text, and that the normalization is done to stabilize the results and prevent outliers.

Useful resources
I am trying to use a mix of Huggingface APIs and LlamaIndex functions to build a Streamlit hosted RAG. I am having trouble understanding why there are extra processing steps in the get_query_embedding step (specifically, https://github.com/run-llama/llama_index/blob/main/llama_index/embeddings/huggingface.py#L139). What's the difference between using get_query_embedding and just getting the embeddings from a Huggingface model?
L
R
4 comments
Lots of embedding models have separate "instructions" for embedding queries vs. text (I.e. attaching some prefix to the query or text you are about to embed)

Some systems (like dragon retriever) even use two models!
For some models though, this will just be the same embedding method for both query and text
Interesting, thanks @Logan M ! Just above, in the _embed function, is there a reason to do a final normalization after getting the embeddings from the model?
Just to stabilize the results a bit and prevent outliers -- its good practice πŸ‘
Add a reply
Sign up and join the conversation on Discord