Find answers from the community

Updated 2 months ago

# VRAM usage

VRAM usage


Hi, I just started learning LlamaIndex and I'm going through this tutorial. I'm using a RTX 3070 with 8GB vram, and loading the mistral-instruct-7B-Q4_K_M LLM which is 4.37GB with LM studio.

Everything is fine. I kept asking questions about my document until at some point it returns the insufficient memory error.

So I print out the VRAM usage at different point of the code. Then I realised that LlamaIndex uses much more VRAM than I expected, especially my testing text is only 5.4KB.
From loading the small text file to VectorStoreIndex() 2.2GB was needed.
I have 3 questions:
  1. why is using so much memory?
  2. is there any way to mitigate this?
  3. how could I estimate how much memory is needed for my document?
This is my code along with output from nvidia-smi.

Thank you very much πŸ™
L
l
c
3 comments
Probably lowering the batch size on embeddings would help.

HuggingFaceEmbedding(..., embed_batch_size=2)
Thank you, the memory usage come down a bit.

What does embed_batch_size argument do?
It is the number of Nodes can embed parallel
Add a reply
Sign up and join the conversation on Discord