@Logan M max is 4096 right?
Not quite. The input and output are connected
The model generates one token at time, adds it to the input, and generates the next token
So the prompt needs to "have room" to generate tokens. The max input size is 4096
That's what num_output does in the prompt helper, leaves room to generate a certain number of tokens
So what is the suggested max tokens and num_output?
Since it has to apply for both embedded index query, and non index summary prompt.
@Logan M So what is the suggested max tokens and num_output?
Since it has to apply for both embedded index query, and non index summary prompt.
@Logan M summary prompt case, the response is not so long.
Usually people have max_tokens anywhere from the default (256) up to around 1500ish tokens. Beyond that, you will face a lot of difficulties
Try setting it to 512, see how that works
@Logan M will increase tokens reduce efficiency or performance?
@Logan M So if I set max token to 516, should I set num_output to 516 as well? Previously I set default for max token, and num_output only 48, and it works fine.
Yea set both to the same value, just to be safe.
Hmmm, responses might be slightly slower, but probably not too much impact
Thanks Logan. Then why we need to set max token and number output to be same?
@Logan M I set both max token and number output to 512, max chunk overlap remain same with before for 20. And then face this error.
are you also setting chunk size limit? The math gets a little crazy when these parameters are adjusted lol
Also maybe pass service_context into the query as well, I remember that fixed an issue for another person as well
@Logan M Thanks Logan. Currently I see if I only update max token to 512, but remain num output to 48, it works fine. Just want to know what is the risk if these values not remain same?
I think the risk is that possibly there won't be enough room to generate 512 tokens. But if it works... 🤷♂️
Num_output ensures that every input sent to openAI is a maximum length of max_input_size minus nun_output (of course, the input can also be smaller than this, leaving even more room)