Find answers from the community

Updated 2 years ago

“Llama generate prefix match hit”

At a glance

The post mentions a "Llama.generate: prefix-match hit" notification. Community members in the comments explain that this is likely due to a built-in cache in the llama.cpp library, which can make generation faster. They also suggest setting verbose=False on the LLM object to reduce the amount of logging, as the llama.cpp library can be noisy. One community member notes that the notification is just informational and not something to worry about, and another community member expresses appreciation for the information.

“Llama.generate: prefix-match hit”
L
c
3 comments
llama.cpp has some sort of built-in cache, that just means your generation will be faster... I think

Setting verbose=False on the LLM object should hopefully reduce the amount of logging, althogh the llama.cpp library is pretty noisy in general
It's just a notification though, nothing to worry about 👍
Add a reply
Sign up and join the conversation on Discord