Find answers from the community

Updated 3 months ago

ChatMemory Buffer

Hello, I tried using llama index open ai chat engine but I am encountering one problem. If I start chatting in a way that the responses become long it seems like the context string that gets passed to the LLM becomes too long and I hit a token limit error.
Plain Text
openai.error.InvalidRequestError: This model's maximum context length is 4097 tokens. However, your messages resulted in 4121 tokens (4070 in the messages, 51 in the functions). Please reduce the length o
f the messages or functions.   

Has this happened to anyone else and what could I do to fix it?
W
l
t
5 comments
Which ChatEngine are you using?
You need to set token limit for the chat memory.

https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_context.html

Plain Text
from llama_index.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=3000)

chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt="You are a chatbot, able to have normal interactions, as well as talk about an essay discussing Paul Grahams life.",
)

This should solve your case!
That is model dependant.
gpt-3.5 will only have 4097 max tokens
gpt-4 however will have twice as much
you can either change the model (gpt-3.5-16k could be a good options), change the memory buffer, or make your prompts smaller
Yep, for that only, they can set the token limit as per their choice of model.
Thanks I'll try it. I actually updated to the latest version and it seems like I'm not hitting the error now but I'll have it mind if it comes up again.
Yeah the new version has 1500 token limit set as default I guess.
Add a reply
Sign up and join the conversation on Discord