Hey guys I feel stupid posing this but after quantizing the model like this llm = HuggingFaceLLM( model_name="HuggingFaceH4/zephyr-7b-alpha", tokenizer_name="HuggingFaceH4/zephyr-7b-alpha", query_wrapper_prompt=PromptTemplate("<|system|>\n</s>\n<|user|>\n{query_str}</s>\n<|assistant|>\n"), context_window=3900, max_new_tokens=256, model_kwargs={"quantization_config": quantization_config}, # tokenizer_kwargs={}, generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95}, messages_to_prompt=messages_to_prompt, device_map="auto", ) how are we supposed to store the quantized model locally
im running on a 4070 laptop actually can I get away with not using quantizing for Zephyr-7B? My supervisor asked me to keep it lightweight but if my PC can run it Ill call it lightweight🤣