The community members are discussing which part of the model stack is used for learning during training and calculating embeddings during retrieval for models like Jina AI. The main points are:
- The entire model stack is used, with the output of the last layer (the hidden state/layer) being used for the embeddings.
- The embeddings are typically created by pooling the embedding dimensions for each token, such as using averaging.
- The community members discuss whether this is the default approach for embedding models, and agree that it is.
- They also discuss whether the encoder part of a large language model (LLM) could be used to calculate embeddings, especially if the model is fine-tuned on a task similar to the retrieval task.
- The community members note that most LLMs are decoder-only, but there are ways to get embeddings from decoder models, such as the llama.cpp library (though the exact details are not provided).
- One community member clarifies that they were referring to the part of the model without the language model head (the linear layer and softmax), which is essentially an encoder.
There is no explicitly marked answer in the provided information.
I know thats the lingo some people use. However I was merely talking about the part without lm_head so linear layer and softmax to calculate the emebddings, which is always just an encoder π€ Because if you take away the head from a decoder, its basically an encoder.