Find answers from the community

Updated last year

Some of my chat responses are getting

Some of my chat responses are getting cut off when they are outputted is there a way to get around this without streaming the message?
L
a
4 comments
Hmm, there could be a few issues

  1. Not enough room to generate the full text
  2. The default max output is 256 tokens, you can change this though in the LLM definition
llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2, max_tokens=300)
Ah okay I have been using the llama 2 7b from hugging face instead and when you say not enough room to generate the full text is that due to an overload of input
Not due to overload of the prompt, just more like not enough room?

Like by default in our query engines, llama-index tries to leave room for num_output tokens to be generated.

In the chat engines, there is less of a restriction, so you may need to limit the chat history a bit more?
Okay sounds good thank you I have tried resetting the chat memory each time which does not seem to be working but I also have included a limit into the prompt
Add a reply
Sign up and join the conversation on Discord