hello @Logan M ! I'm trying to be an

At a glance

hello @Logan M ! I'm trying to be an early adopter of the new Nomic AI embedding model but I seem to be running into an error. Unfortunately I cannot use their API so it must run locally, I am embedding around 100k nodes on a T4 machine into a Weaviate vector db.

I am defining the model like this:

Plain Text

model = AutoModel.from_pretrained("nomic-ai/nomic-embed-text-v1", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

embed_model = HuggingFaceEmbedding(
        model=model,
        tokenizer=tokenizer,
        max_length = 2048
        )

Trying to keep a short index batch size:

Plain Text

index = VectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context, show_progress=True, insert_batch_size = 512)

This is the error I'm getting:

Plain Text

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB. GPU 0 has a total capacity of 14.58 GiB of which 45.56 MiB is free. Including non-PyTorch memory, this process has 14.53 GiB memory in use. Of the allocated memory 14.08 GiB is allocated by PyTorch, and 335.53 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Any idea? 🙂

19 comments

LLogan M

try changing the embed batch size

embed_model = HuggingFaceEmbedding(..., embed_batch_size=2)

LLogan M

I actually haven't tested this model yet either lol

LLogan M

You might also want to change the pooling

LLogan M

embed_model = HuggingFaceEmbedding(..., pooling="mean")

LLogan M

I need to make that more automatic

JJAX

yeah so batch size = 2 doesnt work, same error

JJAX

same with the pooling

LLogan M

pooling is more for the actual result

LLogan M

does batch size of 1 work?

LLogan M

if not, I think the model is just too big for your GPU lol

JJAX

that would be sad, I mean it's a T4 w 16GB VRAM

LLogan M

You can also lower the max length too

LLogan M

1024 is probably good enough

JJAX

ok so batch size 1 seems to be ok (even though gonna take probably 2 centuries to finish)

JJAX

batch size 1 and length 2048

JJAX

ok i said yey too early, crashed after about 10k nodes 😄

JJAX

trying 1024 length + batch size 1

JJAX

small update - i was able to do it with 1024 length and batch size of 1 it did take a while to do that 🙂

a separate question - I'm using the Sentence Window thing in my system and generally the embedding is done on small chunks of texts, could it be that's why so far mpnet_base_v2 is literally the 🐐 among the embeddings that I've tried? BGE / OpenAI / Jina / Nomic they are absolutely terrible compared to mpnet_base_v2. Does it make sense what I'm thinking? If yes, then I guess the focus is to try different embeddings that were trained on small corpus lengths (not sure what's the word for it)? Would you recommend something specific?

bbin4ry_d3struct0r

I'm experiencing a similar issue, but I'll wait until the new release this week before I post my own questions.

Add a reply

Find answers from the community

hello @Logan M ! I'm trying to be an