Hello everyone, I have been trying to

Hello everyone, I have been trying to follow to this cookbook (https://docs.llamaindex.ai/en/latest/examples/cookbooks/llama3_cookbook/#setup-llm-using-huggingfacellm) to just get a basic set up on running Llama3 however I have been running into the following issues.

I have followed to the instructions to install the latest version of the packages but still having the same errors when doing quantization to 4bit.

Plain Text

Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

If im not doing quantization and just running the code as in the cook book. I got this

Plain Text

TypeError: BFloat16 is not supported on MPS and ImportError:

Just want to know if there's anyone experience the same issues when trying it out on a similar set up as mine? I am on M1 Mac Pro.

The portion where I am having troubles is over here

Plain Text

# set up llm using HuggingFaceLLM
import torch
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name=model_name,
    model_kwargs={
        "token": hf_token,
        "torch_dtype": torch.bfloat16,
        "quantization_config": quantization_config
    },
    generate_kwargs={
        "do_sample": True,
        "temperature": 0,
        "top_p": 0.9,
    },
    tokenizer_name=model_name,
    tokenizer_kwargs={"token": hf_token},
    stopping_ids=stopping_ids,
)

Find answers from the community

Hello everyone, I have been trying to