Hi, I'm following the example of the llamacpp in the documentation but I get an error when trying to use a Huggingfacemodel. I'm running on intel CPU
https://gpt-index.readthedocs.io/en/v0.9.2/examples/llm/llama_2_llama_cpp.htmlmodel_url = "
https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
llm = LlamaCPP(
# You can pass in the URL to a GGML model to download it automatically
model_url=model_url,
# optionally, you can set the path to a pre-downloaded model instead of model_url
model_path=None,
temperature=0.1,
max_new_tokens=256,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=3900,
# kwargs to pass to __call__()
generate_kwargs={},
# kwargs to pass to __init__()
# set to at least 1 to use GPU
model_kwargs={"n_gpu_layers":0},<------------I put this to 0 as I don't have GPU
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
gguf_init_from_file: invalid magic characters tjgg.error loading model: llama_model_loader: failed to load model from /tmp/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.binllama_load_model_from_file: failed to load modelAVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |anybody know if I should change the version of the model or the llamacpp python package?
I've tried for instance with this version but it also didn't work:
!pip install llama-cpp-python==0.1.78