Quick question Can someone clarify what

At a glance

The post asks for clarification on what num_outputs refers to in the PromptHelper class. Community members discuss that num_outputs is used to ensure the input prompts to the language model (LLM) are small enough to generate the desired number of output tokens. It is not the exact opposite of max_input_size, but rather the remaining tokens available after accounting for the prompt length. Setting num_outputs higher allows for longer responses, but reduces the amount of context that can be included in each LLM call. Community members suggest that a value around 512 is often a good starting point, and that num_outputs and max_tokens (passed to the LLM) should be set to the same value.

PPocketColin

Quick question - Can someone clarify what num_outputs refers to in the PromptHelper class? Is that the number of tokens allowed in the output? The name is throwing me off

26 comments

PPocketColin

Also the only place I can see it used is here:

Plain Text

result = (
    self.max_input_size - num_prompt_tokens - self.num_output
) // num_chunks

but I don't get how that's useful...

LLogan M

Num outputs is used to make sure the inputs to the LLM are small enough to generate num_output tokens

LLogan M

With models like openai, the input and output are connected

Tokens are generated one at a time, each one added to the input before generating the next token

LLogan M

So llama index needs to make sure the prompts to the LLM are small enough to have room to generate a response

PPocketColin

hmm ok but then what does num_outputs refer to? Like how do I calculate what to put there?

LLogan M

It refers to how much the LLM might generate

LLogan M

By default, openai has max_tokens of 256, so num_output is also 256

LLogan M

I should also mention these numbers are all measuring tokens lol

PPocketColin

ok gotcha so to clarify is num_output just the other side of max_input_size? Like maybe num_output should be called max_output_size?

PPocketColin

and is 256 for an ADA model? So a GPT 3.0 model would be 4096?

LLogan M

It's not quite the other side. If I set num_output to 1000, then it takes max_input_size (4096 for openai) and tries to ensure that the prompts sent are at most 4096-1000=3096 tokens long

LLogan M

The term max_output_size kind of makes sense for a user's perspective yea

PPocketColin

Ok so num_output is the remaining tokens available after you account for prompt length

LLogan M

Yea that's it 💪

PPocketColin

Is there a reason that num_output can't be calculated automatically? I'm just wishing it was optional because I don't really care how long the output is. I'd like it to be as long as it needs to be.

LLogan M

So, let's say you retrieve your top k nodes, and you have compact mode on.

How much text do you put in each call to the LLM as context? You can calculate the length of the prompt template and query tokens, and that's the minimum length since those can't be trimmed

But the context can be trimmed. So llama index uses max_input_size and num_output to figure out how big each piece of context should be

LLogan M

So... I don't see how this can be automatic 😅 the bigger num_output gets, the less context you can include in each LLM call

PPocketColin

hmmm ok yes I think I follow that

LLogan M

If you need to change it because your reponses ate getting cut off, usually somewhere around 512 is a good size for most use cases

LLogan M

Just make sure max_tokens and num_output are the same value and you are good to go

PPocketColin

max_tokens?

PPocketColin

do you mean max_input_size?

PPocketColin

or is there a max_tokens value I can set somewhere? In which case, not sure what that refers to haha

PPocketColin

oh it's passed to the llm class

PPocketColin

nevermind I got it 👍 😄

LLogan M

:dotsHARDSTYLE:

Add a reply

Find answers from the community

Quick question Can someone clarify what