Find answers from the community

Updated 10 months ago

how can I run hugging face embedding model over multiple gpus

@kapa.ai how can I run hugging face embedding model over multiple gpus
k
2 comments
Please stand by @balanp as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
Based on the information provided in the extracts, there is no specific mention or instructions on how to run the HuggingFace embedding model over multiple GPUs using LlamaIndex. However, you can generally use the device_map parameter when initializing the HuggingFaceLLM to specify the device(s) you want to use. Here is an example:

Plain Text
llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=2048,
    generate_kwargs={"temperature": 0.0, "do_sample": False},
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name=selected_model,
    model_name=selected_model,
    device_map="auto",  # change this to specify your GPUs
    model_kwargs={"torch_dtype": torch.float16, "load_in_8bit": True},
)


Please note that the device_map parameter is set to "auto" in this example, which means it will automatically use the available device(s). If you want to use multiple GPUs, you might need to specify them manually. However, the exact way to do this is not mentioned in the provided extracts.

For more detailed information, you might want to refer to the LlamaIndex documentation () or the HuggingFace documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord