Find answers from the community

Updated 2 years ago

Use cpu

At a glance

The post discusses using the SentenceTransformer from the HuggingFaceEmbeddings library, where the community member can set the device to 'cpu', but this is not the default behavior. The comments suggest using an environment variable to set the CUDA_VISIBLE_DEVICES to an empty string to force the model to run on the CPU. However, this does not always work, and the community members discuss other potential solutions, such as passing the device option directly to the model definition or using a different library like LangChain. The post also includes a request for help in fine-tuning the LLaMA2 model and loading it on a local machine.

Useful resources
i saw that HuggingFaceEmbeddings will use the SentenceTransformer and there you can set: "model = SentenceTransformer('model_name_or_path', device='cpu')" but its not default behavior xD (Source: https://www.sbert.net/examples/applications/computing-embeddings/README.html)
L
s
g
11 comments
Hmmm, it looks like the only way is to use an env variable.

os.environ["CUDA_VISIBLE_DEVICES"] = ""
Thanks for your help πŸ˜„ it still goes to the GPU 😦 my code: https://gist.github.com/devinSpitz/15c55b244ba372088165a96184020040
Attachment
image.png
Hmmm, maybe it needs to be an env variable before launching?

export CUDA_VISIBLE_DEVICES=""
trying right now πŸ˜„
If that doesnt work, langchain might need a PR so that the device option can be passed into the model definition πŸ˜…
ohh it actualy did work i think cause i get another error now πŸ˜„ :
We need an offload_dir to dispatch this model according to this device_map, the following submodules need to be offloaded: base_model.model.model.layers.1, base_model.model.model.layers.2, base_model.model.model.layers.3, base_model.model.model.layers.4, base_model.model.model.layers.5, base_model.model.model.layers.6, base_model.model.model.layers.7, base_model.model.model.layers.8, base_model.model.model.layers.9, base_model.model.model.layers.10, base_model.model.model.layers.11, base_model.model.model.layers.12, base_model.model.model.layers.13, base_model.model.model.layers.14, base_model.model.model.layers.15, base_model.model.model.layers.16, base_model.model.model.layers.17, base_model.model.model.layers.18, base_model.model.model.layers.19, base_model.model.model.layers.20, base_model.model.model.layers.21, base_model.model.model.layers.22, base_model.model.model.layers.23, base_model.model.model.layers.24, base_model.model.model.layers.25, base_model.model.model.layers.26, base_model.model.model.layers.27, base_model.model.model.layers.28, base_model.model.model.layers.29, base_model.model.model.layers.30, base_model.model.model.layers.31, base_model.model.model.norm, base_model.model.lm_head.
Ooo interesting! πŸ‘€
It is trying to answer right now. I have it with dolly xD it takes very long but works πŸ˜„ updated the code in the link above πŸ˜„
Attachments
image.png
image.png
image.png
Heh yea, running the LLM on cpu will probably not be too fast πŸ˜… but at least it works!
Hi everyone. I want to fine tune the llama2 model and use it on my local machine. I trained the model using autotrain-advanced on colab and now trying to load it on local machine. Please share references how can i load the fine tuned model and generate the responses for the queries. I got below files after finetuning adapter_config.json
adapter_model.bin
added_tokens.json
optimizer.pt
pytorch_model.bin
README.md
rng_state.pth
scheduler.pt
special_tokens_map.json
tokenizer.json
tokenizer.model
tokenizer_config.json
trainer_state.json
training_args.bin
Add a reply
Sign up and join the conversation on Discord