Use cpu

At a glance

The post discusses using the SentenceTransformer from the HuggingFaceEmbeddings library, where the community member can set the device to 'cpu', but this is not the default behavior. The comments suggest using an environment variable to set the CUDA_VISIBLE_DEVICES to an empty string to force the model to run on the CPU. However, this does not always work, and the community members discuss other potential solutions, such as passing the device option directly to the model definition or using a different library like LangChain. The post also includes a request for help in fine-tuning the LLaMA2 model and loading it on a local machine.

Useful resources

sskydel0

i saw that HuggingFaceEmbeddings will use the SentenceTransformer and there you can set: "model = SentenceTransformer('model_name_or_path', device='cpu')" but its not default behavior xD (Source: https://www.sbert.net/examples/applications/computing-embeddings/README.html)

11 comments

LLogan M

Hmmm, it looks like the only way is to use an env variable.

os.environ["CUDA_VISIBLE_DEVICES"] = ""

sskydel0

Thanks for your help 😄 it still goes to the GPU 😦 my code: https://gist.github.com/devinSpitz/15c55b244ba372088165a96184020040

Attachment

LLogan M

Hmmm, maybe it needs to be an env variable before launching?

export CUDA_VISIBLE_DEVICES=""

sskydel0

trying right now 😄

LLogan M

If that doesnt work, langchain might need a PR so that the device option can be passed into the model definition 😅

sskydel0

ohh it actualy did work i think cause i get another error now 😄 :

sskydel0

We need an offload_dir to dispatch this model according to this device_map, the following submodules need to be offloaded: base_model.model.model.layers.1, base_model.model.model.layers.2, base_model.model.model.layers.3, base_model.model.model.layers.4, base_model.model.model.layers.5, base_model.model.model.layers.6, base_model.model.model.layers.7, base_model.model.model.layers.8, base_model.model.model.layers.9, base_model.model.model.layers.10, base_model.model.model.layers.11, base_model.model.model.layers.12, base_model.model.model.layers.13, base_model.model.model.layers.14, base_model.model.model.layers.15, base_model.model.model.layers.16, base_model.model.model.layers.17, base_model.model.model.layers.18, base_model.model.model.layers.19, base_model.model.model.layers.20, base_model.model.model.layers.21, base_model.model.model.layers.22, base_model.model.model.layers.23, base_model.model.model.layers.24, base_model.model.model.layers.25, base_model.model.model.layers.26, base_model.model.model.layers.27, base_model.model.model.layers.28, base_model.model.model.layers.29, base_model.model.model.layers.30, base_model.model.model.layers.31, base_model.model.model.norm, base_model.model.lm_head.

LLogan M

Ooo interesting! 👀

sskydel0

It is trying to answer right now. I have it with dolly xD it takes very long but works 😄 updated the code in the link above 😄

Attachments

LLogan M

Heh yea, running the LLM on cpu will probably not be too fast 😅 but at least it works!

ggnan

Hi everyone. I want to fine tune the llama2 model and use it on my local machine. I trained the model using autotrain-advanced on colab and now trying to load it on local machine. Please share references how can i load the fine tuned model and generate the responses for the queries. I got below files after finetuning adapter_config.json
adapter_model.bin
added_tokens.json
optimizer.pt
pytorch_model.bin
README.md
rng_state.pth
scheduler.pt
special_tokens_map.json
tokenizer.json
tokenizer.model
tokenizer_config.json
trainer_state.json
training_args.bin

Add a reply

Find answers from the community

Use cpu