Find answers from the community

Updated 3 weeks ago

Value Error: Failed to Load Model from File

ValueError: Failed to load model from file: /tmp/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin
L
l
13 comments
ggml is not supported by llama.cpp anymore, its a very old format
but tbh, I wouldn't use llama.cpp
use ollama, there is way too much config to worry about with llama.cpp
ollama can run any gguf model
eventually i got it run but for 1, i have this warning: .conda/lib/python3.12/site-packages/llama_cpp/llama.py:1138: RuntimeWarning: Detected duplicate leading "<s>" in prompt, this will likely reduce response quality, consider removing it...
warnings.warn(
and for 2, it was killing my cpu, it gone up to 99% of usage, i rarely had any such issues with ollama, even with much larger llm, which is very strange, i thought llama-index is much more optimized. but i am very closely following what ever code was given in the tutorial, so i am not sure what i am missing here.
my idea is to have much more granular control over my hardware, such that i can customize gpu layers and do benchmarks or other applications with python.
that was my intention
i am also considering follow this guide to know more about python-llama-cpp
it does not seem to be using llama-index
Add a reply
Sign up and join the conversation on Discord