CitationQueryEngine
, you can consider using the ChatEngine
abstraction provided by LlamaIndex. The ChatEngine
maintains conversation history through a memory module like buffer memory. It performs retrieval given a condensed question, and feeds the condensed question + context + chat history into the final LLM prompt.from llama_index.core.memory import ChatMemoryBuffer from llama_index.core.chat_engine import CondensePlusContextChatEngine memory = ChatMemoryBuffer.from_defaults(token_limit=3900) chat_engine = CondensePlusContextChatEngine.from_defaults( index.as_retriever(), memory=memory, llm=llm, context_prompt=( "You are a chatbot, able to have normal interactions, as well as talk" " about the Kendrick and Drake beef." "Here are the relevant documents for the context:\n" "{context_str}" "\nInstruction: Use the previous chat history, or the context above, to interact and help the user." ), verbose=True, )
chat
method of the chat_engine
to maintain a conversation history:response = chat_engine.chat( "Tell me about the songs Drake released in the beef." ) print(str(response)) response = chat_engine.chat("What about Kendrick?") print(str(response))
CitationQueryEngine
.index.as_retriever()
and llm
with your own retriever and LLM. The context_prompt
should also be adjusted according to your specific use case.CitationQueryEngine
while also keeping track of the conversation history, you can consider using the CondenseQuestionChatEngine
provided by LlamaIndex. This chat engine generates a standalone question from the conversation context and the last message, then queries the query engine for a response. CondenseQuestionChatEngine
:from llama_index.core.chat_engine import CondenseQuestionChatEngine from llama_index.core.memory import ChatMemoryBuffer # Initialize your memory buffer memory = ChatMemoryBuffer.from_defaults(token_limit=3900) # Initialize your chat engine chat_engine = CondenseQuestionChatEngine( query_engine=CitationQueryEngine, # replace with your CitationQueryEngine instance memory=memory, verbose=True, ) # Use the chat engine response = chat_engine.chat("Your question here") print(response)
CitationQueryEngine
should be replaced with your instance of CitationQueryEngine
. The chat
method of chat_engine
will maintain a conversation history and provide a response with citations.CondenseQuestionChatEngine
does not directly provide the citation functionality of the CitationQueryEngine
. The citation functionality is provided by the CitationQueryEngine
itself. The CondenseQuestionChatEngine
simply uses the CitationQueryEngine
to generate responses.condense_question_prompt
is a template that instructs the model on how to condense the conversation history and the latest user message into a standalone question. Here is an example of how you can define a condense_question_prompt
:from llama_index.core import PromptTemplate condense_question_prompt = PromptTemplate( """\ Given a conversation (between Human and Assistant) and a follow up message from Human, \ rewrite the message to be a standalone question that captures all relevant context \ from the conversation. <Chat History> {chat_history} <Follow Up Message> {question} <Standalone question> """ )
{chat_history}
will be replaced with the actual conversation history and {question}
will be replaced with the latest user message. The model will then generate a standalone question based on this prompt.condense_question_prompt
to the CondenseQuestionChatEngine
:chat_engine = CondenseQuestionChatEngine.from_defaults( query_engine=query_engine, condense_question_prompt=condense_question_prompt, chat_history=custom_chat_history, verbose=True, )