llm = HuggingFaceLLM( model_name="meta-llama/Llama-2-7b-chat-hf", tokenizer_name="meta-llama/Llama-2-7b-chat-hf", query_wrapper_prompt=PromptTemplate("<s> [INST] {query_str} [/INST] "), context_window=3900, model_kwargs={"token": hf_token, "quantization_config": quantization_config}, tokenizer_kwargs={"token": hf_token}, device_map="auto", )
HuggingFaceLLM system_prompt none is not an allowed value (type=type_error.none.not_allowed)
And I haven't been able to figure it out. (I'm a complete newbie here and this is my first time going through the llamaindex documentation, etc). Has anybody run into this before? I'm running on windows 11 WSL Ubuntu.INFO:accelerate.utils.modeling:We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). Loading checkpoint shards: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 2/2 [00:01<00:00, 1.17it/s] Traceback (most recent call last): File "/root/llm-contextualize/starter.py", line 34, in <module> llm = HuggingFaceLLM( File "/root/llm-contextualize/venv/lib/python3.10/site-packages/llama_index/llms/huggingface.py", line 228, in __init__ super().__init__( File "/root/llm-contextualize/venv/lib/python3.10/site-packages/pydantic/v1/main.py", line 341, in __init__ raise validation_error pydantic.v1.error_wrappers.ValidationError: 1 validation error for HuggingFaceLLM system_prompt none is not an allowed value (type=type_error.none.not_allowed)
system_prompt = """<|SYSTEM|># StableLM Tuned (Alpha version) - StableLM is a helpful and harmless open-source AI language model developed by StabilityAI. - StableLM is excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user. - StableLM is more than just an information source, StableLM is also able to write poetry, short stories, and make jokes. - StableLM will refuse to participate in anything that could harm a human. """ quantization_config = BitsAndBytesConfig( load_in_4bit=True, # No idea what this means bnb_4bit_compute_dtype=torch.float16, # no idea why this is needed bnb_4bit_quant_type="nf4", # Magic string, no idea what this means bnb_4bit_use_double_quant=True, # No idea about what this does ) llm = HuggingFaceLLM( model_name="meta-llama/Llama-2-7b-chat-hf", tokenizer_name="meta-llama/Llama-2-7b-chat-hf", query_wrapper_prompt=PromptTemplate("<s> [INST] {query_str} [/INST] "), context_window=3900, system_prompt=system_prompt, model_kwargs={"token": hf_token, "quantization_config": quantization_config}, tokenizer_kwargs={"token": hf_token}, device_map="auto", )
"<s> [INST] {query_str} [/INST] "
is pretty cryptic to me.query_wrapper_prompt=PromptTemplate("<s> [INST] {query_str} [/INST] "),
-- it is cryptic, and you can blame the llama2 creators π system_prompt=""
I think -- I'll patch the actual bug in the library