LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 years ago

How do I limit the responses so they are shorter?

How do I limit the responses so they are shorter?

At a glance

The community members are discussing how to limit the length of responses from an AI assistant. The initial post asks how to limit the responses so they are shorter. The comments provide several suggestions:

1. Adjust the chunk_size_limit parameter when creating a ServiceContext to limit the size of text chunks used in the prompt.

2. Adjust the num_output parameter when defining a custom PromptHelper to set the number of output tokens for the language model.

3. Use prompt engineering techniques, such as including instructions like "Be as brief as possible", to try to convince the model to provide shorter responses.

4. Implement a fallback mechanism where if the response is too long, make an extra call to summarize the response.

There is no explicitly marked answer, but the community members provide several suggestions for limiting the length of responses.

Useful resources

·

@kapa.ai How do I limit the responses so they are shorter?

1

k

E

L

16 comments

Please stand by @EnderEnder as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:

----------------------
To limit the responses so they are shorter, you can adjust the chunk_size_limit parameter when creating a ServiceContext. This parameter limits the size of text chunks used in the prompt. Here's an example of how to create a ServiceContext with a custom chunk_size_limit:

Plain Text

from gpt_index.indices.service_context import ServiceContext

service_context = ServiceContext.from_defaults(chunk_size_limit=your_desired_limit)

Replace your_desired_limit with the desired maximum chunk size.

Keep in mind that this option is not available/utilized in GPTTreeIndex. For more information on ServiceContext, refer to the documentation (https://gpt-index.readthedocs.io/en/latest/reference/service_context.html).
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

@kapa.ai This limits the input size but not the output size..

Let me check the knowledge base for you. One moment! :books:

----------------------
Apologies for the confusion. To limit the output size, you can adjust the num_output parameter when defining a custom PromptHelper. This parameter sets the number of output tokens for the LLM. Here's an example of how to create a PromptHelper with a custom num_output:

Plain Text

from llama_index import PromptHelper

max_input_size = 4096
num_output = your_desired_output_tokens
max_chunk_overlap = 20

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

Replace your_desired_output_tokens with the desired number of output tokens.

After creating the custom PromptHelper, you can use it when creating a ServiceContext and then use that ServiceContext when constructing your index. Here's an example:

Plain Text

from llama_index import LLMPredictor, GPTSimpleVectorIndex, ServiceContext
from langchain import OpenAI

# Define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003"))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

For more information on PromptHelper, refer to the documentation (https://gpt-index.readthedocs.io/en/latest/reference/prompt_helper.html).
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------

For shrinking output sizes, you almost need to convince the model to talk less lol

Other methods will just cut the response short

Yeah i am realising that. The sentence "Be as brief as possible." Seems to be the most reliable.

I lied...

Attachment

Screenshot_2023-05-11_at_9.22.52_AM.png

It really likes macs... lol

Maybe you need a fallback, where if the response is too long, make an extra call to summarize the response lol

Or just more prompt engineering

I'll try the latter first. I haven't implemented streaming into it yet so the response times are already back breaking before I consider a refinement process or a multi-step decomp.

But good suggestions, thank you!

Actually interesting, how does streaming work with multi-step?

It will only stream the final step. Which kinda sucks, but I'm not sure if there's a better way 🤔

If you are trying to limit the response to a specific number of chars I have had pretty good luck with that. I just had to handle a job with a bunch of questions that needed responses with a 2000 char limit so I specified that in the prompt, or emphasized it rather. I wonder if using specific keywords may get the model to take that instruction more seriously, like with the chatOpenAi interface and the system prompt there definitely seem to be certain keywords you can use in the instruction set. Next time I need to do something similar, I might use the keyword "Task" , eg include an instruction in my prompt like Task: the response must be less than 2000 chars or different variations to try to ensure compliance as, depending on how you word your prompt, it may not always comply with the instruction.

Add a reply

Sign up and join the conversation on Discord