Find answers from the community

Updated 2 months ago

Hi guys, haven't been using llama index

Hi guys, haven't been using llama index for a while but all of a sudden my basic script for launching self hosted llm stopped working. Here's the thing:
Plain Text
def run_llm_old(system_prompt, model_name='meta-llama/Llama-2-13b-chat-hf',
                quantization_config=None):
    query_wrapper_prompt = PromptTemplate(
        "[INST]<<SYS>>\n" + system_prompt + "<</SYS>>\n\n{query_str}[/INST] "
    )
    if quantization_config:
        model_kwargs = {"quantization_config": quantization_config}
    else:
        model_kwargs = {"torch_dtype": torch.float16}
    logging.info(f"Configuring model {model_name}")
    llm = HuggingFaceLLM(
        model_name=model_name,
        tokenizer_name=model_name,
        context_window=8192,
        max_new_tokens=1024,
        model_kwargs=model_kwargs,
        query_wrapper_prompt=query_wrapper_prompt,
        generate_kwargs={"temperature": 0.35, "top_k": 40, "top_p": 0.9, "do_sample": True},
        device_map="auto",
    )
    logging.info(f"LLM {model_name} configured successfully")
    return llm

This function was working flawless few months ago, now when using logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
it only comes to this point:
Plain Text
2024-06-21 10:14:51,099 - INFO - Configuring model meta-llama/Llama-2-13b-chat-hf
2024-06-21 10:14:51,101 - DEBUG - Starting new HTTPS connection (1): huggingface.co:443
2024-06-21 10:14:51,398 - DEBUG - https://huggingface.co:443 "HEAD /meta-llama/Llama-2-13b-chat-hf/resolve/main/config.json HTTP/1.1" 200 0

and gets stuck without loading model shards. When removing whole model directory shards are being downloaded but its stuck after:
Plain Text
model-00003-of-00003.safetensors: 100%|[...]|
Downloading shards: 100%|[...]|
Add a reply
Sign up and join the conversation on Discord