Find answers from the community

Updated last year

Embedding

Hi. How to load a fine tuned transformer model as custom embeddings?
W
V
L
32 comments
You want to use your embedding model in place of existing ones?

If so, then you can extend the BaseEmbedding class to define your embed_model and use it.

https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#custom-embedding-model
from llama_index.embeddings import HuggingFaceEmbedding #now, I'm loading embeddings like this HuggingFaceEmbedding(model_name='jinaai/jina-embeddings-v2-base-en',pooling='mean')|

but it is pretrained.
I finetuned a bge for my use case , I want to load that model instead of this pretrained model
Is the fine-tuned model available on HF?
No it's not available on HF. I basically trained a bge for my use case and I have it stored in local
Okay then you can use the following, to use your own embed model
Suppose I defined a class like this :
InstructorEmbeddings(BaseEmbedding)

how to load this class to use in llamaIndex
Also Can I load a model like this for my use case ?
Attachment
image.png
Once you define the class, You can instantiate the class for your own model.
Found one full example for your case: https://docs.llamaindex.ai/en/stable/examples/embeddings/custom_embeddings.html
got it. Thank you so much for the resolution πŸ’•
Hi.
I was able to integrate my custom embeddings model. But what I observed is when I'm querying, for a query when queried multiple times, embeddings generated are same but got different similarity scores. Because of which nodes order also changed. It is the same Index in both cases.
@WhiteFang_Jr @Logan M can you guys please help me this?
For the same query with no change?

Are you using chat engine? If using condense mode it changes your query. So maybe that could be one reason
No, I'm using query engine.
Yes same query with no changes. I checked the embeddings stored. They are same. But similarity score is not same.

for reference:
query_engine = index.as_query_engine(response_synthesizer=response_synthesizer, similarity_top_k=kVal) responses = query_engine.query(qry+' in the document. Answer in a sentence')
I just checked, There was a slight difference in the similarity score like one time it detected the node with 0.70 and next it detected it with 0.71
I guess your nodes are being detected with almost similar score so maybe a little up and down is changing the order.
Yes. That is what I observed.
Is it changing the response?
Yes it is changing response.
Why is there a difference in similarity score? When query and index are same both the times.
Not really much idea on the embedding model working. We will have to wait for Logan input on this.
Meanwhile you can interact with the model directly. Check if it is generating different score for the same text.


Plain Text
# Iterate to check the whole process to check score for checking the variation if happening
embedding_node = embed_model.get_text_embedding(node_text)
embedding_query = embed_model.get_text_embedding(query)

# compare for score
print(embed_model.similarity(embedding_node, embedding_query))
calculating embeddings is not always 100% the same each time. Depending on your hardware, very minor variations can change the similarity score (i.e. 0.70->0.71), especially for local models.

As @WhiteFang_Jr mentioned though, you can test yourself with the above.

Not really much else I can add πŸ˜…
Hi @Logan M Logan. But the Embeddings created are same every time. Just the similarity score is not consisent.
Moreover, llamaIndex pipeline is giving inconsistent results. But when I calculate it manually like @WhiteFang_Jr suggested , I'm getting consistent results everytime.
Hi @WhiteFang_Jr
With llama Index I'm getting different embeddings for same query sometimes. Logan seems to say the same thing.
When I load the model and create embeddings without llamaIndex pipeline, I'm getting the same embeddings everytime.
So I'm thinking, can I some how create the embeddings locally and pass it to the llamaIndex pipeline for retrieval of similar nodes and final prediction. By locally I mean creating the embeddings by some external class without using llamaIndex pipeline.
Can we do it ?
I think you are already using the custom embedding wrapper right?

I think same is still being done, as you pass the text to your model and it returns back the embedding right.
yes I'm using a custom embedding wrapper in llama Index but embeddings are not same every time.
But If I just create embeddings outside the llamaIndex pipeline I'm getting same embeddings everytime .
You can try and check if this works, Actually similarity happens inside the similarity method like mentioned here: https://discord.com/channels/1059199217496772688/1171707808518000670/1174642081319354438
I understand the Similarity part.
But I'm getting 2 embeddings for the same query without any change.
I'm attaching 2 screenshots of embeddings for reference
any idea why this might be happening ?
Attachments
image.png
image.png
Not really though πŸ˜…
When I use my custom model to generate the embeddings outside llama Index pipeline then I'm getting same embeddings everytime. I'm getting different embeddings only when I generate embeddings through the llamaIndex pipeline.
Just giving extra information
When you you are using the custom embed model, it means that you have your own embed model right?

If fetching the embedding from the outside is working then you can try going with that

Also when you do the following steps: https://discord.com/channels/1059199217496772688/1171707808518000670/1174642081319354438 you are getting the same embedding everytime.

I guess you can try adding an extra layer in between to fetch embeddings.
yes I have my own embed model. I will try to fetch embeddings with an extra layer .
Thanks for the help πŸ’• .
Add a reply
Sign up and join the conversation on Discord