Find answers from the community

Updated 11 months ago

While using an ingestion pipeline that

At a glance

While using an ingestion pipeline that ingest in a qdrant vector store I'm having problems with the GPU VRAM. The vectors are store in the qdrant but the gpu memory does not flush until the python process is killed

4 comments

WWhiteFang_Jr

Could it be the embedding model holding the space?🤔

LLORKA

Could be, but 80gb a bit too much for a e5 small

LLORKA

Also, you can see how the used memory increases as the batches get processed

LLogan M

a) you could lower the batch size
b) you could lower the max-length (or chunk size), since E5 has a rather large max input size. Memory for the model lazily allocates on the fly (i.e. an input sequence of 8 tokens will only allocate memory for those 8 tokens. If the next sequence was 16 tokens, an additional 8 tokens of memory usage would be allocated)

I could see E5 using 80GB with both a large batch size and large inputs

Add a reply