VRAM usage
Hi, I just started learning
LlamaIndex
and I'm going through
this tutorial. I'm using a RTX 3070 with 8GB vram, and loading the mistral-instruct-7B-Q4_K_M LLM which is 4.37GB with LM studio.
Everything is fine. I kept asking questions about my document until at some point it returns the
insufficient memory error.
So I print out the VRAM usage at different point of the code. Then I realised that LlamaIndex uses much more VRAM than I expected, especially my testing text is only 5.4KB.
From loading the small text file to
VectorStoreIndex()
2.2GB was needed.
I have 3 questions:
- why is using so much memory?
- is there any way to mitigate this?
- how could I estimate how much memory is needed for my document?
This is my code along with output from nvidia-smi.
Thank you very much π