Find answers from the community

Updated 9 months ago

intfloat/multilingual-e5-large · Hugging...

Can anyone recommend an embedding model for swedish? I am currently using https://huggingface.co/intfloat/multilingual-e5-large but am not getting great results with RAG. It works, but seems oddly fixated on specific chunks without reason.
s
L
11 comments
Have you tried reranking?

I'm not sure specifically on Swedish but a good open combo might be BAAI/bge-m3 for embeddings and a reranker of BAAI/bge-reranker-large
you could also look at finetuning your embedding model
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker

from llama_index.embeddings.huggingface import HuggingFaceEmbedding


Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")

rerank = FlagEmbeddingReranker(top_n=3, model="BAAI/bge-reranker-large")
@sysfor I like the same concern for French.
How can I finetuning embedding model?
For French, you could look at using mistral-embed model. They are French afterall so I would suspect they would work well.
It's "cheap" price wise but i suspect bge-m3 would work too. i've only really ever looked at English since that's what I use
from llama_index.embeddings.mistralai import MistralAIEmbedding
pip install llama-index-embeddings-mistralai
Settings.embed_model = MistralAIEmbedding(model_name= "mistral-embed", api_key="<YOUR_API_KEY>")
^ i've not done this but will consider it to improve my rag pipeline
Add a reply
Sign up and join the conversation on Discord