Find answers from the community

Updated 2 years ago

Some of my chat responses are getting

At a glance

Some of my chat responses are getting cut off when they are outputted is there a way to get around this without streaming the message?

4 comments

LLogan M

Hmm, there could be a few issues

Not enough room to generate the full text
The default max output is 256 tokens, you can change this though in the LLM definition

llm = OpenAI(model="gpt-3.5-turbo", temperature=0.2, max_tokens=300)

aaszaiman1

Ah okay I have been using the llama 2 7b from hugging face instead and when you say not enough room to generate the full text is that due to an overload of input

LLogan M

Not due to overload of the prompt, just more like not enough room?

Like by default in our query engines, llama-index tries to leave room for num_output tokens to be generated.

In the chat engines, there is less of a restriction, so you may need to limit the chat history a bit more?

aaszaiman1

Okay sounds good thank you I have tried resetting the chat memory each time which does not seem to be working but I also have included a limit into the prompt

Add a reply