Find answers from the community

Updated last year

TrNsform

At a glance

The community members are discussing which part of the model stack is used for learning during training and calculating embeddings during retrieval for models like Jina AI. The main points are:

- The entire model stack is used, with the output of the last layer (the hidden state/layer) being used for the embeddings.

- The embeddings are typically created by pooling the embedding dimensions for each token, such as using averaging.

- The community members discuss whether this is the default approach for embedding models, and agree that it is.

- They also discuss whether the encoder part of a large language model (LLM) could be used to calculate embeddings, especially if the model is fine-tuned on a task similar to the retrieval task.

- The community members note that most LLMs are decoder-only, but there are ways to get embeddings from decoder models, such as the llama.cpp library (though the exact details are not provided).

- One community member clarifies that they were referring to the part of the model without the language model head (the linear layer and softmax), which is essentially an encoder.

There is no explicitly marked answer in the provided information.

can someone explain which part of this stack is used to learn during training and calculate during retrieval with embedding models like jinaai?
Attachment
image.png
L
L
7 comments
This entire stack is used πŸ‘€

The output of the last layer is called the hidden state/layer, and this is what's used for embeddings.

There's an embedding dimension for each token, and these are typically pooled (i.e. using averaging, or other techniques)
is that the default with embeddings models?
It is, at least to my knowledge πŸ‘
does that mean I could simply take the encoder part of an LLM to calculate embeddings?
especially if its fine-tuned on a specific task close to the retrieval task?
Most LLMs are decoder only actually.

But there is ways to get embeddings from decoder models, I know llama.cpp does it (I can't remember how it works exactly though)
I know thats the lingo some people use. However I was merely talking about the part without lm_head so linear layer and softmax to calculate the emebddings, which is always just an encoder πŸ€“ Because if you take away the head from a decoder, its basically an encoder.
Add a reply
Sign up and join the conversation on Discord