Find answers from the community

Updated 3 months ago

Limit

Hi everyone ! Can anyone tell me what is the max limit of max_new_tokens of TheBloke/Llama-2-7b-Chat-GGUF model .
L
G
9 comments
So, the thing about LLMs, is that the input and output are the same

Llama2 has a 4096 context window. Every token you specify in max_new_tokens is actually subtracting from the max input size.

So, in theory, the max is 4095, but that means only one token as input
okay it mean if we set the max_new_tokens = 4095 then model will take only 1 token per request as input ?
Can you please explain the difference between context window and max_new_token ?
Context window is the max context size for the LLM

Max new tokens is how much of that context window should be reserved for output tokens
Okay ! got it
thank you πŸ™‚
now my doubts are clear
Add a reply
Sign up and join the conversation on Discord