Find answers from the community

Updated last month

Value Error: Failed to Load Model from File

At a glance

The post indicates an error when trying to load a model file, and the comments discuss issues with using the llama.cpp library. Community members suggest using ollama instead, as it has less configuration overhead and can run any gguf model. They also mention encountering performance issues with llama.cpp, such as high CPU usage, and provide links to alternative resources for working with Python-based LLMs.

Useful resources

llucaswillkill

ValueError: Failed to load model from file: /tmp/llama_index/models/llama-2-13b-chat.ggmlv3.q4_0.bin

13 comments

LLogan M

ggml is not supported by llama.cpp anymore, its a very old format

LLogan M

only gguf

LLogan M

but tbh, I wouldn't use llama.cpp

LLogan M

use ollama, there is way too much config to worry about with llama.cpp

LLogan M

ollama can run any gguf model

llucaswillkill

i believe you, i was just follow this tutorial: https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp/

llucaswillkill

eventually i got it run but for 1, i have this warning: .conda/lib/python3.12/site-packages/llama_cpp/llama.py:1138: RuntimeWarning: Detected duplicate leading "<s>" in prompt, this will likely reduce response quality, consider removing it...
warnings.warn(

llucaswillkill

and for 2, it was killing my cpu, it gone up to 99% of usage, i rarely had any such issues with ollama, even with much larger llm, which is very strange, i thought llama-index is much more optimized. but i am very closely following what ever code was given in the tutorial, so i am not sure what i am missing here.

llucaswillkill

my idea is to have much more granular control over my hardware, such that i can customize gpu layers and do benchmarks or other applications with python.

llucaswillkill

that was my intention

llucaswillkill

https://christophergs.com/blog/running-open-source-llms-in-python#install

llucaswillkill

i am also considering follow this guide to know more about python-llama-cpp

llucaswillkill

it does not seem to be using llama-index

Add a reply