LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

Embedding

Embedding

At a glance

The post asks how to load a fine-tuned transformer model as custom embeddings. Community members suggest extending the BaseEmbedding class to define the custom embed_model and using it. They also provide a link to the documentation on custom embedding models.

The community members discuss the process of loading a fine-tuned model instead of a pre-trained one. They suggest that if the fine-tuned model is not available on Hugging Face, the community member can define a custom class like InstructorEmbeddings that extends BaseEmbedding and use it in the LlamaIndex.

However, the community member later reports an issue where the embeddings generated are the same, but the similarity scores are inconsistent, leading to changes in the order of the retrieved nodes and the final response. The community members suggest that this could be due to minor variations in the embedding calculations, especially for local models. They recommend testing the embedding model directly to check the consistency of the similarity scores.

The community member further clarifies that the embeddings are consistent when generated outside the LlamaIndex pipeline, but the inconsistency arises when using the LlamaIndex pipeline. The community members suggest trying to fetch the embeddings locally and passing them to the LlamaIndex pipeline as a potential solution.

Useful resources

·

Hi. How to load a fine tuned transformer model as custom embeddings?

W

V

L

32 comments

You want to use your embedding model in place of existing ones?

If so, then you can extend the BaseEmbedding class to define your embed_model and use it.

https://docs.llamaindex.ai/en/stable/module_guides/models/embeddings.html#custom-embedding-model

from llama_index.embeddings import HuggingFaceEmbedding

#now, I'm loading embeddings like this HuggingFaceEmbedding(model_name='jinaai/jina-embeddings-v2-base-en',pooling='mean')

|

but it is pretrained.
I finetuned a bge for my use case , I want to load that model instead of this pretrained model

Is the fine-tuned model available on HF?

No it's not available on HF. I basically trained a bge for my use case and I have it stored in local

Okay then you can use the following, to use your own embed model

Suppose I defined a class like this :
InstructorEmbeddings(BaseEmbedding)

how to load this class to use in llamaIndex

Also Can I load a model like this for my use case ?

Attachment

Once you define the class, You can instantiate the class for your own model.
Found one full example for your case: https://docs.llamaindex.ai/en/stable/examples/embeddings/custom_embeddings.html

got it. Thank you so much for the resolution 💕

Hi.
I was able to integrate my custom embeddings model. But what I observed is when I'm querying, for a query when queried multiple times, embeddings generated are same but got different similarity scores. Because of which nodes order also changed. It is the same Index in both cases.

@WhiteFang_Jr @Logan M can you guys please help me this?

For the same query with no change?

Are you using chat engine? If using condense mode it changes your query. So maybe that could be one reason

No, I'm using query engine.
Yes same query with no changes. I checked the embeddings stored. They are same. But similarity score is not same.

for reference:

query_engine = index.as_query_engine(response_synthesizer=response_synthesizer, similarity_top_k=kVal) responses = query_engine.query(qry+' in the document. Answer in a sentence')

I just checked, There was a slight difference in the similarity score like one time it detected the node with 0.70 and next it detected it with 0.71
I guess your nodes are being detected with almost similar score so maybe a little up and down is changing the order.

Yes. That is what I observed.

Is it changing the response?

Yes it is changing response.

Why is there a difference in similarity score? When query and index are same both the times.

Not really much idea on the embedding model working. We will have to wait for Logan input on this.
Meanwhile you can interact with the model directly. Check if it is generating different score for the same text.

Plain Text

# Iterate to check the whole process to check score for checking the variation if happening
embedding_node = embed_model.get_text_embedding(node_text)
embedding_query = embed_model.get_text_embedding(query)

# compare for score
print(embed_model.similarity(embedding_node, embedding_query))

calculating embeddings is not always 100% the same each time. Depending on your hardware, very minor variations can change the similarity score (i.e. 0.70->0.71), especially for local models.

As @WhiteFang_Jr mentioned though, you can test yourself with the above.

Not really much else I can add 😅

Hi @Logan M Logan. But the Embeddings created are same every time. Just the similarity score is not consisent.
Moreover, llamaIndex pipeline is giving inconsistent results. But when I calculate it manually like @WhiteFang_Jr suggested , I'm getting consistent results everytime.

Here's the function we use for calculating
https://github.com/run-llama/llama_index/blob/fc28449aa988a0c184a23c046b35b2406ca35310/llama_index/embeddings/base.py#L35

Hi @WhiteFang_Jr
With llama Index I'm getting different embeddings for same query sometimes. Logan seems to say the same thing.
When I load the model and create embeddings without llamaIndex pipeline, I'm getting the same embeddings everytime.
So I'm thinking, can I some how create the embeddings locally and pass it to the llamaIndex pipeline for retrieval of similar nodes and final prediction. By locally I mean creating the embeddings by some external class without using llamaIndex pipeline.
Can we do it ?

I think you are already using the custom embedding wrapper right?

I think same is still being done, as you pass the text to your model and it returns back the embedding right.

yes I'm using a custom embedding wrapper in llama Index but embeddings are not same every time.
But If I just create embeddings outside the llamaIndex pipeline I'm getting same embeddings everytime .

You can try and check if this works, Actually similarity happens inside the similarity method like mentioned here: https://discord.com/channels/1059199217496772688/1171707808518000670/1174642081319354438

I understand the Similarity part.
But I'm getting 2 embeddings for the same query without any change.
I'm attaching 2 screenshots of embeddings for reference
any idea why this might be happening ?

Attachments

Not really though 😅

When I use my custom model to generate the embeddings outside llama Index pipeline then I'm getting same embeddings everytime. I'm getting different embeddings only when I generate embeddings through the llamaIndex pipeline.

Just giving extra information

When you you are using the custom embed model, it means that you have your own embed model right?

If fetching the embedding from the outside is working then you can try going with that

Also when you do the following steps: https://discord.com/channels/1059199217496772688/1171707808518000670/1174642081319354438 you are getting the same embedding everytime.

I guess you can try adding an extra layer in between to fetch embeddings.

yes I have my own embed model. I will try to fetch embeddings with an extra layer .
Thanks for the help 💕 .

Add a reply

Sign up and join the conversation on Discord