Find answers from the community

Updated 3 months ago

Llamacpp

I'm trying to use either HuggingFaceLLM or LlamaCPP .

On both tries, it keeps hanging with the following:

Plain Text
llama_print_timings:        load time = 159918.60 ms
llama_print_timings:      sample time =   240.40 ms /   256 runs   (    0.94 ms per token,  1064.88 tokens per second)
llama_print_timings: prompt eval time = 550695.06 ms /  3328 tokens (  165.47 ms per token,     6.04 tokens per second)
llama_print_timings:        eval time = 63265.28 ms /   255 runs   (  248.10 ms per token,     4.03 tokens per second)
llama_print_timings:       total time = 614952.10 ms
Llama.generate: prefix-match hit


This is running on an EC2 machine (that's pretty beefy).

It does work locally on my Mac M2 (with LlamaCPP).

Ideas or suggestions?
L
p
5 comments
Wowza, so it completed one iteration (which took.... many seconds)

And it's likely hitting a refine step 🫠
Tbh it doesn't matter the cpu, I really wouldn't try running this without a gpu
to be clear, I'm constantly getting these logs. like, hundreds of em.

but yea, will try on a GPU
yea, those logs will happen every time the LLM is run, unless verbose=False
llm = LlamaCPP(..., verbose=False)
Add a reply
Sign up and join the conversation on Discord