Find answers from the community

Updated 3 months ago

Limit

Hi everyone ! Can anyone tell me what is the max limit of max_new_tokens of TheBloke/Llama-2-7b-Chat-GGUF model .

9 comments

So, the thing about LLMs, is that the input and output are the same

Llama2 has a 4096 context window. Every token you specify in max_new_tokens is actually subtracting from the max input size.

So, in theory, the max is 4095, but that means only one token as input

GGauri

okay it mean if we set the max_new_tokens = 4095 then model will take only 1 token per request as input ?

LLogan M

Yes

GGauri

okay

GGauri

Can you please explain the difference between context window and max_new_token ?

LLogan M

Context window is the max context size for the LLM

Max new tokens is how much of that context window should be reserved for output tokens

GGauri

Okay ! got it

GGauri

thank you 🙂

GGauri

now my doubts are clear

Add a reply