@Logan M i’m talking about gte version of Qwen 2 -
https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct. It currently leads the mteb leaderboard (considering models with Apache license). It’s extremely useful for me thanks to its multilingual capabilities. However, I’m currently running 2xl40s and after loading 70B model with transformers I’m unable to fit qwen embedding model as it targets one card only. That’s why I was interested in using quantified embed model. I’ll do some testing and let you guys know