Llamacpp

At a glance

The community member is trying to use either HuggingFaceLLM or LlamaCPP, but is encountering issues where the process keeps hanging. The logs show long load times, slow sample and prompt evaluation times, and overall long total times. The community member is running this on an EC2 machine, but it works fine on their local Mac M2 with LlamaCPP.

In the comments, other community members suggest that the issue is likely due to the lack of a GPU, and that the logs will happen every time the LLM is run unless verbose=False is set. One community member recommends trying to run the LLM on a GPU instead of the CPU.

pparagoniq

I'm trying to use either HuggingFaceLLM or LlamaCPP .

On both tries, it keeps hanging with the following:

Plain Text

llama_print_timings:        load time = 159918.60 ms
llama_print_timings:      sample time =   240.40 ms /   256 runs   (    0.94 ms per token,  1064.88 tokens per second)
llama_print_timings: prompt eval time = 550695.06 ms /  3328 tokens (  165.47 ms per token,     6.04 tokens per second)
llama_print_timings:        eval time = 63265.28 ms /   255 runs   (  248.10 ms per token,     4.03 tokens per second)
llama_print_timings:       total time = 614952.10 ms
Llama.generate: prefix-match hit

This is running on an EC2 machine (that's pretty beefy).

It does work locally on my Mac M2 (with LlamaCPP).

Ideas or suggestions?

5 comments

LLogan M

Wowza, so it completed one iteration (which took.... many seconds)

And it's likely hitting a refine step 🫠

LLogan M

Tbh it doesn't matter the cpu, I really wouldn't try running this without a gpu

pparagoniq

to be clear, I'm constantly getting these logs. like, hundreds of em.

but yea, will try on a GPU

LLogan M

yea, those logs will happen every time the LLM is run, unless verbose=False

LLogan M

llm = LlamaCPP(..., verbose=False)

Add a reply

Find answers from the community

Llamacpp