Hi guys, haven't been using llama index for a while but all of a sudden my basic script for launching self hosted llm stopped working. Here's the thing:
def run_llm_old(system_prompt, model_name='meta-llama/Llama-2-13b-chat-hf',
quantization_config=None):
query_wrapper_prompt = PromptTemplate(
"[INST]<<SYS>>\n" + system_prompt + "<</SYS>>\n\n{query_str}[/INST] "
)
if quantization_config:
model_kwargs = {"quantization_config": quantization_config}
else:
model_kwargs = {"torch_dtype": torch.float16}
logging.info(f"Configuring model {model_name}")
llm = HuggingFaceLLM(
model_name=model_name,
tokenizer_name=model_name,
context_window=8192,
max_new_tokens=1024,
model_kwargs=model_kwargs,
query_wrapper_prompt=query_wrapper_prompt,
generate_kwargs={"temperature": 0.35, "top_k": 40, "top_p": 0.9, "do_sample": True},
device_map="auto",
)
logging.info(f"LLM {model_name} configured successfully")
return llm
This function was working flawless few months ago, now when using
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
it only comes to this point:
2024-06-21 10:14:51,099 - INFO - Configuring model meta-llama/Llama-2-13b-chat-hf
2024-06-21 10:14:51,101 - DEBUG - Starting new HTTPS connection (1): huggingface.co:443
2024-06-21 10:14:51,398 - DEBUG - https://huggingface.co:443 "HEAD /meta-llama/Llama-2-13b-chat-hf/resolve/main/config.json HTTP/1.1" 200 0
and gets stuck without loading model shards. When removing whole model directory shards are being downloaded but its stuck after:
model-00003-of-00003.safetensors: 100%|[...]|
Downloading shards: 100%|[...]|