Find answers from the community

Updated 2 months ago

How can i

How can i
k
H
5 comments
Please stand by @Hoaz as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
I'm sorry, but your question seems to be incomplete. Could you please provide more details or context? I'm here to help with questions related to LlamaIndex.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai How can I load an Embedding model from hugging face into cpu ram instead of gpu?
Let me check the knowledge base for you. One moment! :books:
----------------------
To load a HuggingFace embedding model into CPU RAM instead of GPU, you can use the HuggingFaceEmbedding class provided by LlamaIndex. Here's an example of how to do it:

Plain Text
from llama_index.embeddings import HuggingFaceEmbedding

# loads BAAI/bge-small-en-v1.5
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")


In this code, HuggingFaceEmbedding is a generic wrapper around any HuggingFace model for embeddings. The model_name parameter specifies the model to load. In this case, it's loading the BAAI/bge-small-en-v1.5 model.

After loading the model, you can generate embeddings for a text like this:

Plain Text
embeddings = embed_model.get_text_embedding("Hello World!")
print(len(embeddings))
print(embeddings[:5])


This will print the length of the embeddings and the first 5 elements of the embeddings.

Please note that by default, HuggingFace models are loaded into CPU RAM. If you want to load the model into GPU, you need to explicitly move the model to GPU using PyTorch's .to('cuda') method.

For more details, you can refer to the LlamaIndex documentation () and the example notebook on GitHub ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord