Find answers from the community

Updated 2 months ago

Embeddings

How do you find which embedding to use? I can’t figure out what to choose for a conversational model
L
s
m
13 comments
Embeddings are not related to your choice of LLM -- they are separate components

Use the embeddings that work best for best for you 🫡
Does the embedding change the way it reacts to inputs? I gotta do some reading up on all this stuff it’s so new to me 😆
embeddings change the "retrieval" step -- essentially, embeddings are what is used to compare a query to all the data you indexed, and retrieves the top-k most similar

Then, using the retrieved text, the LLM handles synthesizing a response to the query in natural language
thanks that makes sense
Embedding turn your texts into vectors / matrix. Different embedding have different variables….
@Logan M is the one I referenced better choice for a q/a pdf for small file sizes
eh, they are probably similar. I would probably use bge-large-en-v1.5, maybe convert it to onnx or something or something too
How do we determine which one is best? Lots of options unclear how they are all different
384 dimensional vs 1024
Please share any reading material that helps understand that factors that go into picking the best embedding lib
They are all extremely similar tbh. I just reference the leaderboard usually

Just pick one from around the top and you'll be fine
https://huggingface.co/spaces/mteb/leaderboard
bge is usually my go-to choice
Add a reply
Sign up and join the conversation on Discord