When I load model
stabilityai/stablelm-2-zephyr-1_6blike this:
llm = HuggingFaceLLM(
# https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b
model_name="stabilityai/stablelm-2-zephyr-1_6b",
tokenizer_name="stabilityai/stablelm-2-zephyr-1_6b",
query_wrapper_prompt=PromptTemplate("<|system|>\n\n<|user|>\n{query_str}\n<|assistant|>\n"),
context_window=3900,
max_new_tokens=256,
model_kwargs={"trust_remote_code": True},
#tokenizer_kwargs={"max_length": 2048},
generate_kwargs={"temperature": 0.7, "top_k": 50, "top_p": 0.95, "do_sample":True},
messages_to_prompt=messages_to_prompt,
device_map="auto",
# uncomment this if using CUDA to reduce memory usage
#model_kwargs={"torch_dtype": torch.float16}
)
although I set
trust_remote_code
to
True
, I still have the question:
Do you wish to run the custom code? [y/N]
.....
model.safetensors: 100%
3.29G/3.29G [00:39<00:00, 111MB/s]
generation_config.json: 100%
121/121 [00:00<00:00, 7.17kB/s]
tokenizer_config.json: 100%
825/825 [00:00<00:00, 37.4kB/s]
The repository for stabilityai/stablelm-2-zephyr-1_6b contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/stabilityai/stablelm-2-zephyr-1_6b.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
**
Do you wish to run the custom code? [y/N] y**
tokenization_arcade100k.py: 100%
9.89k/9.89k [00:00<00:00, 463kB/s]
....
`
Any idea to avoid this?