Find answers from the community

Updated 11 months ago

intfloat/multilingual-e5-large · Hugging...

At a glance

The community member is looking for a recommendation for an embedding model for Swedish, as they are currently using https://huggingface.co/intfloat/multilingual-e5-large but are not getting great results with RAG. Other community members suggest trying reranking with a combination of BAAI/bge-m3 for embeddings and BAAI/bge-reranker-large for reranking. They also suggest fine-tuning the embedding model, and provide example code for using the FlagEmbeddingReranker and HuggingFaceEmbedding from the llama_index library. Another community member asks about fine-tuning the embedding model for French, and is recommended to try the mistral-embed model or the BAAI/bge-m3 model.

Useful resources
Can anyone recommend an embedding model for swedish? I am currently using https://huggingface.co/intfloat/multilingual-e5-large but am not getting great results with RAG. It works, but seems oddly fixated on specific chunks without reason.
s
L
11 comments
Have you tried reranking?

I'm not sure specifically on Swedish but a good open combo might be BAAI/bge-m3 for embeddings and a reranker of BAAI/bge-reranker-large
you could also look at finetuning your embedding model
from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker

from llama_index.embeddings.huggingface import HuggingFaceEmbedding


Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")

rerank = FlagEmbeddingReranker(top_n=3, model="BAAI/bge-reranker-large")
@sysfor I like the same concern for French.
How can I finetuning embedding model?
For French, you could look at using mistral-embed model. They are French afterall so I would suspect they would work well.
It's "cheap" price wise but i suspect bge-m3 would work too. i've only really ever looked at English since that's what I use
from llama_index.embeddings.mistralai import MistralAIEmbedding
pip install llama-index-embeddings-mistralai
Settings.embed_model = MistralAIEmbedding(model_name= "mistral-embed", api_key="<YOUR_API_KEY>")
^ i've not done this but will consider it to improve my rag pipeline
Add a reply
Sign up and join the conversation on Discord