Is it possible to offload GPU VRAM?
Indeed, I load CUDA GPU with an embedding model like this :
Settings.node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=64)
Settings.chunk_size = 512
Settings.chunk_overlap = 64
# https://huggingface.co/OrdalieTech/Solon-embeddings-large-0.1
embeded_model_name = "OrdalieTech/Solon-embeddings-large-0.1"
embed_model = HuggingFaceEmbedding(model_name=embeded_model_name)
Settings.embed_model = embed_model
.....
vector_store_index = VectorStoreIndex.from_documents(documents=documents, show_progress=True)
Then for another calculation I have to load another embedding model, but I get the
CUDA Out of memory
error since the previous model is still present in the GPU VRAM.