Find answers from the community

Updated 3 months ago

llama peeps, can someone explain the

llama peeps, can someone explain the relationship between the 'max_tokens' argument of the llm class versus the 'context_window' and 'num_output' arguments of the PromptHelper? I keep getting the error:
InvalidRequestError: This model's maximum context length is 4097 tokens. However, you requested 4529 tokens (529 in the messages, 4000 in the completion). Please reduce the length of the messages or completion.
my llm definition:
llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo", max_tokens=3000)
my prompt_helper
prompt_helper = PromptHelper( context_window = 4097, num_output = 1000, tokenizer = tiktoken.encoding_for_model('text-davinci-002').encode, chunk_overlap_ratio = 0.01 )
I dont understand the '529' number or where the '4000' is coming from. thanks kindly!
b
t
L
7 comments
oy, this one is a bit confusing for me too. hopefully someone can answer more thoroughly but from what I understand max_tokens for llm will constrain the tokens being passed into llm. Then, num_output is how much room it will leave for output tokens, and context_window is basically context of LLM.
I've had good experience setting max_tokens as None to let it figure out that on it's own
thanks for sharing @bmax , I'll keep digging and hopefully someone else will chime in πŸ˜„
blarg, just tried max_tokens=None no change, same error 😒
try context_window = 4097-num_output
529 i believe is the source nodes you're sending??
Num output and max tokens should be the same.

In the sample code above, you've requested 3000 tokens, but only left room for 1000. (Ngl I have no idea where 4000 is coming from either lol)
Add a reply
Sign up and join the conversation on Discord