Find answers from the community

Updated 5 months ago

Embeddings

At a glance

The community members are discussing how to choose the right embedding for a conversational model. They explain that embeddings are separate from the choice of language model, and are used in the retrieval step to compare the input to the indexed data. Different embeddings have different dimensionality (e.g. 384 vs 1024) and the community members suggest referencing leaderboards to choose a top-performing embedding, with bge-large-en-v1.5 being a common recommendation. However, they note that many embeddings are quite similar, so the specific choice may not be critical as long as it is from a high-performing option.

Useful resources
How do you find which embedding to use? I can’t figure out what to choose for a conversational model
L
s
m
13 comments
Embeddings are not related to your choice of LLM -- they are separate components

Use the embeddings that work best for best for you 🫡
Does the embedding change the way it reacts to inputs? I gotta do some reading up on all this stuff it’s so new to me 😆
embeddings change the "retrieval" step -- essentially, embeddings are what is used to compare a query to all the data you indexed, and retrieves the top-k most similar

Then, using the retrieved text, the LLM handles synthesizing a response to the query in natural language
thanks that makes sense
Embedding turn your texts into vectors / matrix. Different embedding have different variables….
@Logan M is the one I referenced better choice for a q/a pdf for small file sizes
eh, they are probably similar. I would probably use bge-large-en-v1.5, maybe convert it to onnx or something or something too
How do we determine which one is best? Lots of options unclear how they are all different
384 dimensional vs 1024
Please share any reading material that helps understand that factors that go into picking the best embedding lib
They are all extremely similar tbh. I just reference the leaderboard usually

Just pick one from around the top and you'll be fine
https://huggingface.co/spaces/mteb/leaderboard
bge is usually my go-to choice
Add a reply
Sign up and join the conversation on Discord