Is it possible to offload GPU VRAM?

At a glance

The community member is experiencing a CUDA out of memory error when trying to load a second embedding model, as the previous model is still present in the GPU VRAM. A comment suggests a potential solution involving clearing the CUDA memory by deleting the model, calling torch.cuda.empty_cache(), and running gc.collect().

Useful resources

LLeMoussel

Is it possible to offload GPU VRAM?
Indeed, I load CUDA GPU with an embedding model like this :

Plain Text

    Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)
    Settings.chunk_size = 512
    Settings.chunk_overlap = 64

    # https://huggingface.co/OrdalieTech/Solon-embeddings-large-0.1
    embeded_model_name = "OrdalieTech/Solon-embeddings-large-0.1"
    embed_model = HuggingFaceEmbedding(model_name=embeded_model_name)
    Settings.embed_model = embed_model

.....
    vector_store_index = VectorStoreIndex.from_documents(documents=documents, show_progress=True)

Then for another calculation I have to load another embedding model, but I get the CUDA Out of memory error since the previous model is still present in the GPU VRAM.

1 comment

LLogan M

I always find it hard to clear CUDA memory without exiting the program.

My best guess is something like

Plain Text

import torch
import gc
del embed_model
torch.cuda.empty_cache()
gc.collect()

Add a reply

Find answers from the community

Is it possible to offload GPU VRAM?