Find answers from the community

Updated 2 months ago

Hello Guys,

Hello Guys,

I am working on a RAG project and I encountered a problem while trying to use embeddings from HuggingFace in llama-index and I was hoping that someone could help me please.
I tried multiple embeddings in French/Multilanguages from HuggingFace and they do not work (eg : https://huggingface.co/antoinelouis/biencoder-camembert-base-mmarcoFR). The only that works for me is the english BAAI-bge family.


I used the code below :


Plain Text
from llama_index.finetuning import EmbeddingQAFinetuneDataset

dataset = EmbeddingQAFinetuneDataset("path_to_dataset")

corpus = dataset.corpus
nodes = [TextNode(id_=id_, text=text) for id_, text in corpus.items()]

from llama_index

from llama_index import ServiceContext, VectorStoreIndex, Simple
from llama_index.embeddings import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding("local_path_to_model")
service_context = ServiceContext.from_defaults(embed_model=embed_model)

index = VectorStoreIndex(nodes, service_context=service_context, show_progress=True)

The last line crashes and gives me the error :
Plain Text
CUDA error : CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

When I tried to use the CPU, I get another error
Plain Text
IndexError : index out of range in self


Here are some specs :
  • OS : Linux
  • GPU : Nvidia A100 80GB of VRAM
  • CUDA version : 11.8 (I am unable to change this because I am not the admin on this machine).
  • torch version : 2.0.1 (To my knowledge, it's the latest one that works with this version of CUDA).
  • python version : 3.10.6
  • llama-index version : 0.8.65
PS : This Embedding to create a RAG works just fine in langchain
PS-2 : A coworker told me that the model I tried to use doesn't have a pooling. When used via sentence-transformers (hence langchain), a mean pooling is generated. HuggingFaceEmbedding from llama_index.embeddings might not generate the pooling.

Thank you very much for you help,
Kind regards,
Mahmoud
L
L
M
15 comments
Hmmm this worked for me, at least on CPU

Plain Text
>>> from llama_index import ServiceContext, VectorStoreIndex
>>> from llama_index.embeddings import HuggingFaceEmbedding
>>> embed_model = HuggingFaceEmbedding(model_name="antoinelouis/biencoder-camembert-base-mmarcoFR")
>>> service_context = ServiceContext.from_defaults(embed_model=embed_model)
>>> from llama_index import Document
>>> index = VectorStoreIndex.from_documents([Document.example()], service_context=service_context)
>>> response = index.as_query_engine().query("Tell me about LLMs")
>>> str(response)
'LLMs are a type of technology that is used for knowledge generation and reasoning. They are pre-trained on large amounts of publicly available data.
LlamaIndex has pooling for embeddings actually
Attachment
image.png
Maybe if you share the full error we might better-see the issue?

I noticed this model has an age-old issue, you need to set max_length=512 in the constructor I think
antoinelouis/biencoder-camembert-base-mmarcoFR has French vocabulary (fr). Your query is in English (en). Why don't you use French query?
hmm, I know very little french hahaha so I guess thats why (the language shouldn't matter much for testing here)

Looking at the config.json for the model, I think setting max_length=512 is the solution here
Give that a try and let me know if it helps

embed_model = HuggingFaceEmbedding("antoinelouis/biencoder-camembert-base-mmarcoFR", max_length=512)
Hi, thank you so much for your answer. Yes setting max_length=512 solves the problem. Thank also for you answer on the issue I created on GitHub : https://github.com/run-llama/llama_index/issues/8837
Awesome! Yea I posted in both places to make sure the answer was accessible for others πŸ™‚
Yes, It would have been better to query in French. But as @Logan M said I was just trying to debug my code, so English didn't matter in this case.
@Mahmoud In your code, I see dataset = EmbeddingQAFinetuneDataset("path_to_dataset")
Would it be possible to know which dataset you are using?
Yes of course.

It is a custom dataset built from french PDF using the finetuning embeddings QA function from llama-index
What finetunning embeddings QA fonction from LlamaIndex do you use?
I used the SentenceTransformersFinetuneEngine object from llama_index.finetuning : https://gpt-index.readthedocs.io/en/stable/examples/finetuning/embeddings/finetune_embedding.html
Add a reply
Sign up and join the conversation on Discord