Hello, I am trying to figure out if it's possible to run the embeddings model on my GPU rather than the CPU. I have this simple script where
VectorStoreIndex.from_documents(documents)
is taking a long time to finish while maxing out my CPU.
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext, set_global_service_context
from llama_index.llms import OpenAILike
llm = OpenAILike(max_tokens=3900)
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local:BAAI/bge-small-en-v1.5", chunk_size=256, num_output=256)
set_global_service_context(service_context)
documents = SimpleDirectoryReader('data2').load_data()
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir="./vector-storage-esic2")
It seems like one of the following is true:
- I've not configured something properly (in Llama-Index?) which would push the embeddings to the GPU
- This is just how Llama-Index works and can only use the CPU for embeddings
any wisdom is greatly appreciated!