I looked a bit into the documentation and found out that this is the way to define memory and manage the history:
memory = ChatMemoryBuffer.from_defaults(token_limit=1500)
chat_engine = index.as_chat_engine(
chat_mode="condense_plus_context",
memory=memory,
context_prompt=(
"You are a chatbot, able to have normal interactions"
"Here are the relevant documents for the context:\n"
"{context_str}"
"\nInstruction: Based on the above documents, provide a detailed answer for the user question below."
),
verbose=True,
)
However it still saturates the memory for me after I tried it (saturated after like 7 to 9 questions), the context length of the LLM is 2k (I used a modelfile to change it for llama 3 70b ollama)