I know thats the lingo some people use. However I was merely talking about the part without lm_head so linear layer and softmax to calculate the emebddings, which is always just an encoder π€ Because if you take away the head from a decoder, its basically an encoder.