“Llama generate prefix match hit”

At a glance

The post mentions a "Llama.generate: prefix-match hit" notification. Community members in the comments explain that this is likely due to a built-in cache in the llama.cpp library, which can make generation faster. They also suggest setting verbose=False on the LLM object to reduce the amount of logging, as the llama.cpp library can be noisy. One community member notes that the notification is just informational and not something to worry about, and another community member expresses appreciation for the information.

ccaptam_morgan

“Llama.generate: prefix-match hit”

3 comments

LLogan M

llama.cpp has some sort of built-in cache, that just means your generation will be faster... I think

Setting verbose=False on the LLM object should hopefully reduce the amount of logging, althogh the llama.cpp library is pretty noisy in general

LLogan M

It's just a notification though, nothing to worry about 👍

ccaptam_morgan

Awesome, thanks!

Add a reply

Find answers from the community

“Llama generate prefix match hit”