----------------------
Based on the information provided in the extracts, there is no specific mention or instructions on how to run the HuggingFace embedding model over multiple GPUs using LlamaIndex. However, you can generally use the
device_map
parameter when initializing the
HuggingFaceLLM
to specify the device(s) you want to use. Here is an example:
llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=2048,
generate_kwargs={"temperature": 0.0, "do_sample": False},
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name=selected_model,
model_name=selected_model,
device_map="auto", # change this to specify your GPUs
model_kwargs={"torch_dtype": torch.float16, "load_in_8bit": True},
)
Please note that the
device_map
parameter is set to "auto" in this example, which means it will automatically use the available device(s). If you want to use multiple GPUs, you might need to specify them manually. However, the exact way to do this is not mentioned in the provided extracts.
For more detailed information, you might want to refer to the
LlamaIndex documentation () or the
HuggingFace documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------