Embeddings

At a glance

Thats kinda... Surprising? Embeddings models are really small

17 comments

That is kind of surprising. It's just a light wrapper around sentence transformers too

https://github.com/run-llama/llama_index/blob/3308b55c754b06f58063d1dc32e6825f7b9f4e54/llama_index/finetuning/embeddings/sentence_transformer.py#L68

LLogan M

I'm assuming that's the class you used

AAlphaAtlas

Yep

AAlphaAtlas

I am trying to train jinaai/jina-embeddings-v2-base-en which is large for an embeddings model

AAlphaAtlas

Plain Text

from llama_index.finetuning import (
    generate_qa_embedding_pairs,
    EmbeddingQAFinetuneDataset,
)
from llama_index.finetuning import SentenceTransformersFinetuneEngine
import os
os.environ['HF_TOKEN'] = "***"
train_dataset = EmbeddingQAFinetuneDataset.from_json("data/li_full.json")
val_dataset = EmbeddingQAFinetuneDataset.from_json("data/li_test.json")
#Run Embedding Finetuning
from llama_index.finetuning import SentenceTransformersFinetuneEngine
finetune_engine = SentenceTransformersFinetuneEngine(
    train_dataset,
    model_id="jinaai/jina-embeddings-v2-base-en",
    model_output_path="test_llama_index_finetune",
    val_dataset=val_dataset,
)
finetune_engine.finetune()
embed_model = finetune_engine.get_finetuned_model()
embed_model

LLogan M

Yea it's only like 275 MB hey?

I honestly have no idea what the issue is tbh lol especially since llama index isn't really doing much if you look at the code

LLogan M

I feel like there should be an option to specify batch size lol but it's not there

AAlphaAtlas

Actually I think its the context length

AAlphaAtlas

Its 8K, and no flash attention or anything like we use in llm land

LLogan M

Shouldn't the context length be reflected in the model weights size though? Or maybe I don't know how context length works anymore in these new models lol

https://huggingface.co/jinaai/jina-embeddings-v2-base-en/tree/main

AAlphaAtlas

Nah it is a seperate parameter

AAlphaAtlas

In fact, in transformers models it scales non linearly. It scales with model weights as wll, but... TBH I'm not sure why its so hungry.

AAlphaAtlas

No flash attn at least

LLogan M

I just thought there had to be trained/saved parameters associated with each token/context position 🤔 (which should show up in the saved model size?) But yea, otherwise large context size does increae memory usage a ton

AAlphaAtlas

Actually this happens with PEFT training as well.

AAlphaAtlas

IDK if that's what you meant

Add a reply

Find answers from the community

Embeddings