Hi guys!

At a glance

The community members are discussing whether they need to use the same embedding model for indexing and retrieval, or if it's possible to use different models. One community member suggests using a large model like Alibaba-NLP/gte-Qwen2-7B-instruct for indexing and a smaller model like BAAI/bge-large-en-v1.5 for the query engine. Another community member responds that the same model should be used for both indexing and retrieval, and that Qwen is not an embedding model but rather a language model. The discussion also touches on the possibility of using quantization to reduce the model size, and the trade-offs between model size and performance.

Useful resources

bbszaniecki

Hi guys!
Maybe a bit stupid question but do i need to use the same embbeding model for indexing and retrival? Or maybe it would be possible to index data using some large model like Alibaba-NLP/gte-Qwen2-7B-instruct and then use BAAI/bge-large-en-v1.5 for creating query_engine from vector index?

5 comments

bbszaniecki

Also, it is possible to pass quantization_config to embed_model? I've seen that under the hood it triggers SentenceTransformer so theoretically it would be possible to pass quantization config as model_kwargs

LLogan M

Yes, you should use the same model for indexing and retrieval

qwen isn't an embedding model, so not sure what you mean here exactly 🤔 There is a specific embedding model thats used to emebd text, and a specific LLM model used to generate text/responses

So qwen can be the LLM, and BGE can be your embeddings

bbszaniecki

@Logan M i’m talking about gte version of Qwen 2 - https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct. It currently leads the mteb leaderboard (considering models with Apache license). It’s extremely useful for me thanks to its multilingual capabilities. However, I’m currently running 2xl40s and after loading 70B model with transformers I’m unable to fit qwen embedding model as it targets one card only. That’s why I was interested in using quantified embed model. I’ll do some testing and let you guys know

LLogan M

ah I see. I wouldn't take the MTEB too seriously once you get to the top 10. The score differences are minor, and the cost of loading a huge 7B model for embeddings isn't worth it imo, when 400M models or smaller get similar results

LLogan M

Just my two cents though, but sounds like it works well for multi-lingual for you

Add a reply

Find answers from the community

Hi guys!