Find answers from the community

Updated 2 years ago

Hey again Just a simple question when I

At a glance
Hey again! Just a simple question: when I use davinci I can set max_tokens = -1 and this returns complete responses. However, when I use gpt turbo (3.5) i cannot use -1 as max_tokens so i should put 4096 or whatever. The problem is that in this way i get responses supposed to be long and instead they are truncated. How can I solve this? Thank you!
L
A
7 comments
you'll need to also adjust the num_output paremeter in the prompt helper. But you can't set it to 4096 because then there's no room for the actual prompt and context

Try setting it to something like 1000 or 1500, along with setting max_tokens in the model. You might also need to decrease the chunk_size_limit if you hit other problems
Hey Logan thx for answering!
Acting step by step:
  • in prompt helper I should set a value for num_output (that was for?) between 1k and 1.5k.
  • in the model max_tokens Can I use 4096?
  • chunk_size_limit where?
I mean with davinci it’s enough by setting -1, why with gpt3.5 it’s so difficult🥲
I have a feeling you just got lucky with davinci, and the responses were short enough to not cause issues

  1. num_output leaves room in the prompt for the model to generate. For decoder models like GPT, the input and output are kind of connected
  2. I would say no haha. Also use 1k or 1.5k, same as num_output
  3. chunk_size_limit goes in the service context, something like ServiceContext.from_defaults(..., chunk_size_limit=1500)
See this message for a more detailed explanation I gave the other day 🙂 https://discord.com/channels/1059199217496772688/1095619508850413609/1095849289135173732
Wow super useful! Ok so, if I understood well, max_tokens (in the model settings) is not the sum of prompt (query+prompt template+context) and completion, but it refers only to the completion right?

In this way, if I don’t want my text chunks being chunked again and I want to set num_output to 1500 i should use for example:
  • a prompt_template with 500 tokens,
  • context (my text chunks) of 2000
  • query of 50 tokens
Right?
Yes you got that right 👍
And what if it exceeds the 4096 limit? Will the only text chunks (i mean the context) being chunked again?
Yes, the context should be chunked again into smaller pieces
Add a reply
Sign up and join the conversation on Discord