llama peeps, can someone explain the

At a glance

llama peeps, can someone explain the relationship between the 'max_tokens' argument of the llm class versus the 'context_window' and 'num_output' arguments of the PromptHelper? I keep getting the error:

InvalidRequestError: This model's maximum context length is 4097 tokens. However, you requested 4529 tokens (529 in
the messages, 4000 in the completion). Please reduce the length of the messages or completion.

my llm definition:
llm = OpenAI(temperature=0.1, model="gpt-3.5-turbo", max_tokens=3000)
my prompt_helper

prompt_helper = PromptHelper(
    context_window = 4097,
    num_output = 1000,
    tokenizer = tiktoken.encoding_for_model('text-davinci-002').encode,
    chunk_overlap_ratio = 0.01
)

I dont understand the '529' number or where the '4000' is coming from. thanks kindly!

7 comments

bbmax

oy, this one is a bit confusing for me too. hopefully someone can answer more thoroughly but from what I understand max_tokens for llm will constrain the tokens being passed into llm. Then, num_output is how much room it will leave for output tokens, and context_window is basically context of LLM.

bbmax

I've had good experience setting max_tokens as None to let it figure out that on it's own

ttheta

thanks for sharing @bmax , I'll keep digging and hopefully someone else will chime in 😄

ttheta

blarg, just tried max_tokens=None no change, same error 😢

bbmax

try context_window = 4097-num_output

bbmax

529 i believe is the source nodes you're sending??

LLogan M

Num output and max tokens should be the same.

In the sample code above, you've requested 3000 tokens, but only left room for 1000. (Ngl I have no idea where 4000 is coming from either lol)

Add a reply

Find answers from the community

llama peeps, can someone explain the