I'm trying to use either
HuggingFaceLLM
or
LlamaCPP
.
On both tries, it keeps hanging with the following:
llama_print_timings: load time = 159918.60 ms
llama_print_timings: sample time = 240.40 ms / 256 runs ( 0.94 ms per token, 1064.88 tokens per second)
llama_print_timings: prompt eval time = 550695.06 ms / 3328 tokens ( 165.47 ms per token, 6.04 tokens per second)
llama_print_timings: eval time = 63265.28 ms / 255 runs ( 248.10 ms per token, 4.03 tokens per second)
llama_print_timings: total time = 614952.10 ms
Llama.generate: prefix-match hit
This is running on an EC2 machine (that's pretty beefy).
It does work locally on my Mac M2 (with
LlamaCPP
).
Ideas or suggestions?