Is there a way to include memory buffer

At a glance

The community members are discussing how to include a memory buffer along with a hybrid retriever in a Chat Engine - Condense Plus Context Chat. They share some code examples, and discuss the challenges of setting a token limit on the memory buffer to avoid the "context limit exceeded" error. The community members confirm that the token limit will only remove entire messages, not truncate them, and that the memory buffer only stores the previous Q&A, not the retrieved context. There is no explicitly marked answer in the comments.

aanupamaze

Is there a way to include memory buffer along with hybrid retriever with Chat Engine - Condense Plus Context Chat

17 comments

LLogan M

if you attach it to a chat engine or agent, yes

aanupamaze

ok, can u share some indicative code, that will be very helpful

LLogan M

Plain Text

from llama_index import CondensePlusContextChatEngine

chat_engine = CondensePlusContextChatEngine.from_defaults(retriever, memory=memory)

If you don't provide the memory, it just automatically defaults to a chat memory buffer

aanupamaze

when i use this..i hit context limit exceeded error..is there a way to refresh the memory buffer

LLogan M

you can set a token_limit on the memory, but you also need to be careful about what the top-k is on your retriever (too much context will also cause issues)

aanupamaze

actually it keeps on adding previous answers which leads to token limit exceeded..with regular query_engine this error never occurs

LLogan M

Right. Keeping previous answers is what a chat engine is for 😅 it's part of the chat.

Like I said, you can set the token limit, so that the memory only remembers the last X messages that fit into the token limit

aanupamaze

hmm, the only challenge i see with token_limit setting is that it may abruptly truncate a chat message

LLogan M

It won't truncate a message. It will remove the entire message instead

aanupamaze

memory=ChatMemoryBuffer.from_defaults(token_limit=4096)
if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant"):
with st.spinner("Thinking..."):
llm = OpenAI(model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=llm)
query_engine=RetrieverQueryEngine.from_args(retriever=hybrid_retriever,service_context=service_context)
chat_engine=CondensePlusContextChatEngine.from_defaults(query_engine,memory=memory,system_prompt=context_prompt)
response , chat_messages = chat_engine.chat(str(prompt))
if "not mentioned in" in response.response or "I don't know" in re…

Hi Logan, need some clarification on above

aanupamaze

what all does memory variable store, previous QA or QA + context retrieved for all previous QA

LLogan M

Just previous QA, the underlying retrieved context is not saved (it would be way too large)

aanupamaze

got it, thanks for confirming my understanding

aanupamaze

so the token limit of 4096 in memory=ChatMemoryBuffer.from_defaults(token_limit=4096) will be the limit to make sure previous QA do not exceed 4096 tokens

aanupamaze

right?

LLogan M

Pretty much!

aanupamaze

Add a reply

Find answers from the community

Is there a way to include memory buffer