Find answers from the community

Updated 4 months ago

Llamacpp

At a glance

The community member is trying to use either HuggingFaceLLM or LlamaCPP, but is encountering issues where the process keeps hanging. The logs show long load times, slow sample and prompt evaluation times, and overall long total times. The community member is running this on an EC2 machine, but it works fine on their local Mac M2 with LlamaCPP.

In the comments, other community members suggest that the issue is likely due to the lack of a GPU, and that the logs will happen every time the LLM is run unless verbose=False is set. One community member recommends trying to run the LLM on a GPU instead of the CPU.

I'm trying to use either HuggingFaceLLM or LlamaCPP .

On both tries, it keeps hanging with the following:

Plain Text
llama_print_timings:        load time = 159918.60 ms
llama_print_timings:      sample time =   240.40 ms /   256 runs   (    0.94 ms per token,  1064.88 tokens per second)
llama_print_timings: prompt eval time = 550695.06 ms /  3328 tokens (  165.47 ms per token,     6.04 tokens per second)
llama_print_timings:        eval time = 63265.28 ms /   255 runs   (  248.10 ms per token,     4.03 tokens per second)
llama_print_timings:       total time = 614952.10 ms
Llama.generate: prefix-match hit


This is running on an EC2 machine (that's pretty beefy).

It does work locally on my Mac M2 (with LlamaCPP).

Ideas or suggestions?
L
p
5 comments
Wowza, so it completed one iteration (which took.... many seconds)

And it's likely hitting a refine step 🫠
Tbh it doesn't matter the cpu, I really wouldn't try running this without a gpu
to be clear, I'm constantly getting these logs. like, hundreds of em.

but yea, will try on a GPU
yea, those logs will happen every time the LLM is run, unless verbose=False
llm = LlamaCPP(..., verbose=False)
Add a reply
Sign up and join the conversation on Discord