Length cutoff

At a glance

Hi I'm using summary prompt, and calling LLM(prompt) to generate a summary based on user query and context. However, when I tried to print the summary response, seems the response does not show fully. Only shows partial response. Is there anything I'm missing? Or there are length rescrict?

22 comments

LLogan M

Yea openAI has a default max length of 256 tokens. You'll have to set max_tokens in the LLM definition

If you are also using a prompt helper, set num_output and max_tokens to be the same value

https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_llms.html#example-changing-the-number-of-output-tokens-for-openai-cohere-ai21

ccincy

@Logan M max is 4096 right?

LLogan M

Not quite. The input and output are connected

The model generates one token at time, adds it to the input, and generates the next token

LLogan M

So the prompt needs to "have room" to generate tokens. The max input size is 4096

LLogan M

That's what num_output does in the prompt helper, leaves room to generate a certain number of tokens

ccincy

So what is the suggested max tokens and num_output?

ccincy

Since it has to apply for both embedded index query, and non index summary prompt.

ccincy

@Logan M So what is the suggested max tokens and num_output?
Since it has to apply for both embedded index query, and non index summary prompt.

ccincy

@Logan M summary prompt case, the response is not so long.

LLogan M

Usually people have max_tokens anywhere from the default (256) up to around 1500ish tokens. Beyond that, you will face a lot of difficulties

LLogan M

Try setting it to 512, see how that works

ccincy

@Logan M will increase tokens reduce efficiency or performance?

ccincy

Thanks Logan!

ccincy

@Logan M So if I set max token to 516, should I set num_output to 516 as well? Previously I set default for max token, and num_output only 48, and it works fine.

LLogan M

Yea set both to the same value, just to be safe.

LLogan M

Hmmm, responses might be slightly slower, but probably not too much impact

ccincy

Thanks Logan. Then why we need to set max token and number output to be same?

ccincy

@Logan M I set both max token and number output to 512, max chunk overlap remain same with before for 20. And then face this error.

Attachment

LLogan M

are you also setting chunk size limit? The math gets a little crazy when these parameters are adjusted lol

LLogan M

Also maybe pass service_context into the query as well, I remember that fixed an issue for another person as well

ccincy

@Logan M Thanks Logan. Currently I see if I only update max token to 512, but remain num output to 48, it works fine. Just want to know what is the risk if these values not remain same?

LLogan M

I think the risk is that possibly there won't be enough room to generate 512 tokens. But if it works... 🤷‍♂️

Num_output ensures that every input sent to openAI is a maximum length of max_input_size minus nun_output (of course, the input can also be smaller than this, leaving even more room)

Add a reply

Find answers from the community

Length cutoff