llama peeps, can someone explain the relationship between the 'max_tokens' argument of the llm class versus the 'context_window' and 'num_output' arguments of the PromptHelper? I keep getting the error: InvalidRequestError: This model's maximum context length is 4097 tokens. However, you requested 4529 tokens (529 in
the messages, 4000 in the completion). Please reduce the length of the messages or completion. my llm definition: llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo", max_tokens=3000) my prompt_helper prompt_helper = PromptHelper(
context_window = 4097,
num_output = 1000,
tokenizer = tiktoken.encoding_for_model('text-davinci-002').encode,
chunk_overlap_ratio = 0.01
) I dont understand the '529' number or where the '4000' is coming from. thanks kindly!
oy, this one is a bit confusing for me too. hopefully someone can answer more thoroughly but from what I understand max_tokens for llm will constrain the tokens being passed into llm. Then, num_output is how much room it will leave for output tokens, and context_window is basically context of LLM.