----------------------
To load a HuggingFace embedding model into CPU RAM instead of GPU, you can use the
HuggingFaceEmbedding
class provided by LlamaIndex. Here's an example of how to do it:
from llama_index.embeddings import HuggingFaceEmbedding
# loads BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
In this code,
HuggingFaceEmbedding
is a generic wrapper around any HuggingFace model for embeddings. The
model_name
parameter specifies the model to load. In this case, it's loading the
BAAI/bge-small-en-v1.5
model.
After loading the model, you can generate embeddings for a text like this:
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])
This will print the length of the embeddings and the first 5 elements of the embeddings.
Please note that by default, HuggingFace models are loaded into CPU RAM. If you want to load the model into GPU, you need to explicitly move the model to GPU using PyTorch's
.to('cuda')
method.
For more details, you can refer to the
LlamaIndex documentation () and the
example notebook on GitHub ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------