The community members are discussing how to include a memory buffer along with a hybrid retriever in a Chat Engine - Condense Plus Context Chat. They share some code examples, and discuss the challenges of setting a token limit on the memory buffer to avoid the "context limit exceeded" error. The community members confirm that the token limit will only remove entire messages, not truncate them, and that the memory buffer only stores the previous Q&A, not the retrieved context. There is no explicitly marked answer in the comments.
you can set a token_limit on the memory, but you also need to be careful about what the top-k is on your retriever (too much context will also cause issues)
memory=ChatMemoryBuffer.from_defaults(token_limit=4096) if st.session_state.messages[-1]["role"] != "assistant": with st.chat_message("assistant"): with st.spinner("Thinking..."): llm = OpenAI(model="gpt-3.5-turbo") service_context = ServiceContext.from_defaults(llm=llm) query_engine=RetrieverQueryEngine.from_args(retriever=hybrid_retriever,service_context=service_context) chat_engine=CondensePlusContextChatEngine.from_defaults(query_engine,memory=memory,system_prompt=context_prompt) response , chat_messages = chat_engine.chat(str(prompt)) if "not mentioned in" in response.response or "I don't know" in re…
so the token limit of 4096 in memory=ChatMemoryBuffer.from_defaults(token_limit=4096) will be the limit to make sure previous QA do not exceed 4096 tokens