Find answers from the community

Updated 2 years ago

Max input size

At a glance

Just wanted to clarify for PromptHelper/ServiceContext, max_input_size is how many tokens an input can be, and num_output is how many tokens the output can be, right? Seems a bit strange for docs to show max_input_size as 4096 for gpt-3.5 as that's the max context length (which should be == max(input + output tokens)), we'd actually want max_input_size + num_output to equal 4096 max, correct?

7 comments

LLogan M

Yea, it's a little weird because with decoder models like GPT, the input and output are connected

Tokens are generated one at a time, each time appended to the input before generating the next

So yes, the max input size is 4096. But when llama index sends requests, it needs to make sure there's room for num_output tokens

LLLYX

That would mean you want max_input_size in your PromptHelper to be less than 4096, right?

LLogan M

Nah. Internally it does the math with max_input_size set to the max context size of the model

LLogan M

If max_input_size is 4096, and num_output is 256, that means everything to sent to openai should be at most 4096 minus 256 tokens

LLogan M

And the prompt helper figures that all out by trimming and splitting the context text

LLLYX

Ah ok, awesome

LLLYX

Though feels like maybe it should be renamed lol a bit confusing otherwise with all these terms being so similar

Add a reply