Find answers from the community

Updated 9 months ago

While using an ingestion pipeline that

While using an ingestion pipeline that ingest in a qdrant vector store I'm having problems with the GPU VRAM. The vectors are store in the qdrant but the gpu memory does not flush until the python process is killed
W
L
L
4 comments
Could it be the embedding model holding the space?πŸ€”
Could be, but 80gb a bit too much for a e5 small
Also, you can see how the used memory increases as the batches get processed
a) you could lower the batch size
b) you could lower the max-length (or chunk size), since E5 has a rather large max input size. Memory for the model lazily allocates on the fly (i.e. an input sequence of 8 tokens will only allocate memory for those 8 tokens. If the next sequence was 16 tokens, an additional 8 tokens of memory usage would be allocated)

I could see E5 using 80GB with both a large batch size and large inputs
Add a reply
Sign up and join the conversation on Discord