intfloat/multilingual-e5-large

At a glance

The community member is looking for a recommendation for an embedding model for Swedish, as they are currently using https://huggingface.co/intfloat/multilingual-e5-large but are not getting great results with RAG. Other community members suggest trying reranking with a combination of BAAI/bge-m3 for embeddings and BAAI/bge-reranker-large for reranking. They also suggest fine-tuning the embedding model, and provide example code for using the FlagEmbeddingReranker and HuggingFaceEmbedding from the llama_index library. Another community member asks about fine-tuning the embedding model for French, and is recommended to try the mistral-embed model or the BAAI/bge-m3 model.

Useful resources

ccablecutter

Can anyone recommend an embedding model for swedish? I am currently using https://huggingface.co/intfloat/multilingual-e5-large but am not getting great results with RAG. It works, but seems oddly fixated on specific chunks without reason.

11 comments

ssysfor

Have you tried reranking?

I'm not sure specifically on Swedish but a good open combo might be BAAI/bge-m3 for embeddings and a reranker of BAAI/bge-reranker-large

ssysfor

you could also look at finetuning your embedding model

ssysfor

from llama_index.postprocessor.flag_embedding_reranker import FlagEmbeddingReranker

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-m3")

rerank = FlagEmbeddingReranker(top_n=3, model="BAAI/bge-reranker-large")

LLeMoussel

@sysfor I like the same concern for French.
How can I finetuning embedding model?

ssysfor

For French, you could look at using mistral-embed model. They are French afterall so I would suspect they would work well.

ssysfor

It's "cheap" price wise but i suspect bge-m3 would work too. i've only really ever looked at English since that's what I use

ssysfor

from llama_index.embeddings.mistralai import MistralAIEmbedding

ssysfor

pip install llama-index-embeddings-mistralai

ssysfor

Settings.embed_model = MistralAIEmbedding(model_name= "mistral-embed", api_key="<YOUR_API_KEY>")

ssysfor

https://docs.llamaindex.ai/en/stable/optimizing/fine-tuning/fine-tuning.html

ssysfor

^ i've not done this but will consider it to improve my rag pipeline

Add a reply

Find answers from the community

intfloat/multilingual-e5-large · Hugging...