Find answers from the community

Updated last year

Embeddings

I am trying to use a mix of Huggingface APIs and LlamaIndex functions to build a Streamlit hosted RAG. I am having trouble understanding why there are extra processing steps in the get_query_embedding step (specifically, https://github.com/run-llama/llama_index/blob/main/llama_index/embeddings/huggingface.py#L139). What's the difference between using get_query_embedding and just getting the embeddings from a Huggingface model?
L
R
4 comments
Lots of embedding models have separate "instructions" for embedding queries vs. text (I.e. attaching some prefix to the query or text you are about to embed)

Some systems (like dragon retriever) even use two models!
For some models though, this will just be the same embedding method for both query and text
Interesting, thanks @Logan M ! Just above, in the _embed function, is there a reason to do a final normalization after getting the embeddings from the model?
Just to stabilize the results a bit and prevent outliers -- its good practice πŸ‘
Add a reply
Sign up and join the conversation on Discord