Hey guys I feel stupid posing this but

At a glance

Hey guys I feel stupid posing this but after quantizing the model like this llm = HuggingFaceLLM(
model_name="HuggingFaceH4/zephyr-7b-alpha",
tokenizer_name="HuggingFaceH4/zephyr-7b-alpha",
query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"),
context_window=3900,
max_new_tokens=256,
model_kwargs={"quantization_config": quantization_config},
# tokenizer_kwargs={},
generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95},
messages_to_prompt=messages_to_prompt,
device_map="auto",
) how are we supposed to store the quantized model locally

20 comments

LLogan M

model.save_pretrained("path/to/save") I think?

LLogan M

Actually

LLogan M

that won't work, since you can't access the model from the HuggingFaceLLM class

AA_la_lanterne

🥲 yeah

AA_la_lanterne

that was the problem i got

LLogan M

you'd have to quantize outside of llama-index

LLogan M

then save/load from there

AA_la_lanterne

damn

AA_la_lanterne

can i get bigbloke's model

AA_la_lanterne

then load it

AA_la_lanterne

or is there any different that i do quantization myself

LLogan M

you probably could, assuming it's not a gguf/ggml model

Just change the model name and remove the quantization config

AA_la_lanterne

he actually has gguf and a gptu format

AA_la_lanterne

GPTQ*

AA_la_lanterne

would it be loadable w index

LLogan M

Probably some way to load it then, and then pass it in directly

Plain Text

model = <load huggignface model>
llm = HuggingFaceLLM(model=model, ...)

AA_la_lanterne

thank you bro! 🥲

AA_la_lanterne

im running on a 4070 laptop actually can I get away with not using quantizing for Zephyr-7B? My supervisor asked me to keep it lightweight but if my PC can run it Ill call it lightweight🤣

LLogan M

Mmmm how many GB is that GPU? Non-quanitized you probably need at least 16GB VRAM I think ..

AA_la_lanterne

16gb w 8gb vram... I tried to make it work im getting no error but it's slow affff. I m working on getting quantized version. Many thanks !

Add a reply

Find answers from the community

Hey guys I feel stupid posing this but