Hello everyone, I have been trying to follow to this cookbook (
https://docs.llamaindex.ai/en/latest/examples/cookbooks/llama3_cookbook/#setup-llm-using-huggingfacellm) to just get a basic set up on running Llama3 however I have been running into the following issues.
I have followed to the instructions to install the latest version of the packages but still having the same errors when doing quantization to 4bit.
Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
If im not doing quantization and just running the code as in the cook book. I got this
TypeError: BFloat16 is not supported on MPS and ImportError:
Just want to know if there's anyone experience the same issues when trying it out on a similar set up as mine? I am on M1 Mac Pro.
The portion where I am having troubles is over here
# set up llm using HuggingFaceLLM
import torch
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
llm = HuggingFaceLLM(
model_name=model_name,
model_kwargs={
"token": hf_token,
"torch_dtype": torch.bfloat16,
"quantization_config": quantization_config
},
generate_kwargs={
"do_sample": True,
"temperature": 0,
"top_p": 0.9,
},
tokenizer_name=model_name,
tokenizer_kwargs={"token": hf_token},
stopping_ids=stopping_ids,
)