Find answers from the community

Updated 4 months ago

Hello everyone, I have been trying to

At a glance

A community member is trying to set up Llama3 using the instructions from a cookbook, but is encountering issues with 4-bit quantization and getting errors related to BFloat16 and MPS. They have followed the installation instructions, including installing bitsandbytes and accelerate, but are still experiencing problems. Another community member confirms that they have also installed the required packages, but there is no explicitly marked answer to the issues the original poster is facing.

Useful resources
Hello everyone, I have been trying to follow to this cookbook (https://docs.llamaindex.ai/en/latest/examples/cookbooks/llama3_cookbook/#setup-llm-using-huggingfacellm) to just get a basic set up on running Llama3 however I have been running into the following issues.

I have followed to the instructions to install the latest version of the packages but still having the same errors when doing quantization to 4bit.
Plain Text
Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`

If im not doing quantization and just running the code as in the cook book. I got this
Plain Text
TypeError: BFloat16 is not supported on MPS and ImportError: 


Just want to know if there's anyone experience the same issues when trying it out on a similar set up as mine? I am on M1 Mac Pro.

The portion where I am having troubles is over here
Plain Text
# set up llm using HuggingFaceLLM
import torch
from llama_index.llms.huggingface import HuggingFaceLLM
from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

llm = HuggingFaceLLM(
    model_name=model_name,
    model_kwargs={
        "token": hf_token,
        "torch_dtype": torch.bfloat16,
        "quantization_config": quantization_config
    },
    generate_kwargs={
        "do_sample": True,
        "temperature": 0,
        "top_p": 0.9,
    },
    tokenizer_name=model_name,
    tokenizer_kwargs={"token": hf_token},
    stopping_ids=stopping_ids,
)
L
g
2 comments
Did you install as the command says?
Add a reply
Sign up and join the conversation on Discord