Find answers from the community

Updated last year

“Llama generate prefix match hit”

“Llama.generate: prefix-match hit”
L
c
3 comments
llama.cpp has some sort of built-in cache, that just means your generation will be faster... I think

Setting verbose=False on the LLM object should hopefully reduce the amount of logging, althogh the llama.cpp library is pretty noisy in general
It's just a notification though, nothing to worry about 👍
Awesome, thanks!
Add a reply
Sign up and join the conversation on Discord