Wow super useful! Ok so, if I understood well, max_tokens (in the model settings) is not the sum of prompt (query+prompt template+context) and completion, but it refers only to the completion right?
In this way, if I don’t want my text chunks being chunked again and I want to set num_output to 1500 i should use for example:
- a prompt_template with 500 tokens,
- context (my text chunks) of 2000
- query of 50 tokens
Right?