Find answers from the community

Updated 2 months ago

Hey Everyone, I have to run

Hey Everyone, I have to run HuggingFaceEmbedding with multiple GPU support. for this I tried the following code
Plain Text
from injector import inject, singleton
from llama_index import MockEmbedding
from llama_index.embeddings.base import BaseEmbedding

from private_gpt.paths import models_cache_path
from private_gpt.settings.settings import settings

from torch.nn.parallel import DataParallel
from torch.nn.parallel import DistributedDataParallel

@singleton
class EmbeddingComponent:
    embedding_model: BaseEmbedding

    @inject
    def __init__(self) -> None:
        match settings.llm.mode:
            case "local":
                from llama_index.embeddings import HuggingFaceEmbedding

                embedding_model = HuggingFaceEmbedding(
                    model_name=settings.local.embedding_hf_model_name,
                    cache_folder=str(models_cache_path),
                    embed_batch_size = 20,
                )
                self.embedding_model = DataParallel(embedding_model)
            case "sagemaker":

                from private_gpt.components.embedding.custom.sagemaker import (
                    SagemakerEmbedding,
                )

                self.embedding_model = SagemakerEmbedding(
                    endpoint_name=settings.sagemaker.embedding_endpoint_name,
                )


I got the exception when run this
Plain Text
  File "/home/bennison/Documents/yavar/poc/privateGPT/private_gpt/components/embedding/embedding_component.py", line 25, in __init__
    self.embedding_model = DataParallel(embedding_model)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bennison/.cache/pypoetry/virtualenvs/private-gpt-_Dc3_tu1-py3.11/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 148, in __init__
    self.module.to(self.src_device_obj)
    ^^^^^^^^^^^^^^
AttributeError: 'HuggingFaceEmbedding' object has no attribute 'to'
make: *** [Makefile:36: run] Error 1
L
B
18 comments
uhhhh I don't think this will work. Since HuggingFaceEmbedding is not a pytorch model, it's a wrapper around a model

I think you'd have to do the parallelization with the underlying model. I thiiiiink you could wrap the model from huggingface in this?

So load the model with AutoModel and put it in this class you've created

Then you can pass the model in directly like HuggingFaceEmbedding(model=model) ?
I tried as you said, here is the refactored code
Plain Text
                model = AutoModel.from_pretrained( # BAAI/bge-small-en
                    settings.local.embedding_hf_model_name, cache_dir=models_cache_path
                )
                self.embedding_model = HuggingFaceEmbedding(
                    model=model,
                )


After updating the code I got the following exception
Plain Text
  File "/home/bennison/.cache/pypoetry/virtualenvs/private-gpt-_Dc3_tu1-py3.11/lib/python3.11/site-packages/llama_index/embeddings/huggingface.py", line 98, in __init__
    super().__init__(
  File "/home/bennison/.cache/pypoetry/virtualenvs/private-gpt-_Dc3_tu1-py3.11/lib/python3.11/site-packages/pydantic/v1/main.py", line 341, in __init__
    raise validation_error
pydantic.v1.error_wrappers.ValidationError: 1 validation error for HuggingFaceEmbedding
model_name
  none is not an allowed value (type=type_error.none.not_allowed)
make: *** [Makefile:36: run] Error 1


I got the exception in model_name, When I update the code like the above is the model name required?
ah yea, try also passing in the model_name -- it will still use the model you pass in, the model_name is just there for tracking/observability
Plain Text
                model = AutoModel.from_pretrained( # BAAI/bge-small-en
                    settings.local.embedding_hf_model_name, cache_dir=models_cache_path
                )
                self.embedding_model = HuggingFaceEmbedding(
                    model=model,
                )


Here in the above code I did not configure anything about cuda, Will it use all available (multiple) GPU automatically? or should I do any config for this (multiple GPU utalization)?
Hmmm yea something tricky is the inputs. With multiple GPUs, your inputs need to be on the same device

I know you can specify a specific device in the constructor

Plain Text
self.embedding_model = HuggingFaceEmbedding(
    model=model,
    device="cuda:0"
)


But not sure that entirely solves the issue πŸ€”
Hey man, can you explain more elabratly I could not understand what you are saying, If I use device with the parameter cuda:0. will it use all available GPU
So you have multiple GPUs

You might have a GPU on cuda:0 and another GPU on cuda:1

So you need to ensure that your inputs are also moved to the same device

Tbh I don't actually think this will work though.... too complicated.

I would try using something like text-embedding-interface for proper multi-gpu support

https://github.com/huggingface/text-embeddings-inference
https://docs.llamaindex.ai/en/stable/examples/embeddings/text_embedding_inference.html
I used the device paramter also, still it uses the single GPU not all.
Is there any other way to do this?
I did link an alternative using TEI from huggingface

Using these raw torch.nn.parallel stuff, I am less knowleagble about. It not using any GPU is likely related to this not specifying any GPU?

Plain Text
case "local":
  from llama_index.embeddings import HuggingFaceEmbedding
  
  embedding_model = HuggingFaceEmbedding(
      model_name=settings.local.embedding_hf_model_name,
      cache_folder=str(models_cache_path),
      embed_batch_size = 20,
  )
  self.embedding_model = DataParallel(embedding_model)
Other than that, πŸ€·β€β™‚οΈ out of ideas
I don't have any idea about the embedding with multiGPU can you have any other idea?
Here also, I don't know how to configure multiGPU, Can you help me with this?
I'm pretty sure the docker command they give will use all gpus
docker run --gpus all ...
Isn't just enough to install the package, do I need to run it as docker container?
I've only ever used it as a docker container. The instructions in the readme for running as a local install seemed more complicated lol
Add a reply
Sign up and join the conversation on Discord