Find answers from the community

Updated 5 months ago

I am currently using CitationQueryEngine, which provides a nice citation. However, it does

At a glance

The community member is currently using CitationQueryEngine, which provides citations but lacks conversation history like a chat engine. They are seeking ideas or code to use CitationQueryEngine while maintaining a conversation history, similar to a chat engine. The community members discuss using the ChatEngine abstraction from LlamaIndex to maintain conversation history, and the CondenseQuestionChatEngine to preserve the citation functionality while keeping track of the conversation history. An example of using CondenseQuestionChatEngine is provided, and the community members discuss the condense_question_prompt parameter for the CondenseQuestionChatEngine constructor.

@kapa.ai I am currently using CitationQueryEngine, which provides a nice citation. However, it does not have a conversation history like what chat engine does. Can you give me some ideas / codes that allow me to use CitationQueryEngine but at the same time, it can maintain a context or preserve a conversation history so that it will have the ability of chat engine at the same time ?
k
i
10 comments
Please stand by @ihshan :) as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
To achieve the functionality of maintaining a conversation history while using the CitationQueryEngine, you can consider using the ChatEngine abstraction provided by LlamaIndex. The ChatEngine maintains conversation history through a memory module like buffer memory. It performs retrieval given a condensed question, and feeds the condensed question + context + chat history into the final LLM prompt.

Here is an example of how to create a stateful chatbot from a RAG pipeline, with the chat engine abstraction:

Plain Text
from llama_index.core.memory import ChatMemoryBuffer
from llama_index.core.chat_engine import CondensePlusContextChatEngine

memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

chat_engine = CondensePlusContextChatEngine.from_defaults(
    index.as_retriever(),
    memory=memory,
    llm=llm,
    context_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about the Kendrick and Drake beef."
        "Here are the relevant documents for the context:\n"
        "{context_str}"
        "\nInstruction: Use the previous chat history, or the context above, to interact and help the user."
    ),
    verbose=True,
)

You can then use the chat method of the chat_engine to maintain a conversation history:

Plain Text
response = chat_engine.chat(
    "Tell me about the songs Drake released in the beef."
)
print(str(response))

response = chat_engine.chat("What about Kendrick?")
print(str(response))

This code will maintain a conversation history and feed it into the final LLM prompt, similar to what you're looking for with the CitationQueryEngine.

Please note that you would need to replace index.as_retriever() and llm with your own retriever and LLM. The context_prompt should also be adjusted according to your specific use case.
For more details, you can refer to the LlamaIndex documentation () and the example notebooks () on GitHub.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai it is using fully chat engine but removing the ability of ChatQueryEngine.

Using CitationQueryEngine will give output something like this and citation (number) next to sentence automatically "The price of building a new cheese factory is $500 [1] and board has recommended some strategic plan [2]"

However, implementing chat engine does not give me an output like the above where citation next to sentence automatically.

Can you help me how to preserve such citation but I want to have additional memory of conversation history
Let me check the knowledge base for you. One moment! :books:
----------------------
To maintain the citation functionality of the CitationQueryEngine while also keeping track of the conversation history, you can consider using the CondenseQuestionChatEngine provided by LlamaIndex. This chat engine generates a standalone question from the conversation context and the last message, then queries the query engine for a response.

Here is an example of how to use CondenseQuestionChatEngine:

Plain Text
from llama_index.core.chat_engine import CondenseQuestionChatEngine
from llama_index.core.memory import ChatMemoryBuffer

# Initialize your memory buffer
memory = ChatMemoryBuffer.from_defaults(token_limit=3900)

# Initialize your chat engine
chat_engine = CondenseQuestionChatEngine(
    query_engine=CitationQueryEngine,  # replace with your CitationQueryEngine instance
    memory=memory,
    verbose=True,
)

# Use the chat engine
response = chat_engine.chat("Your question here")
print(response)


In this example, CitationQueryEngine should be replaced with your instance of CitationQueryEngine. The chat method of chat_engine will maintain a conversation history and provide a response with citations.

Please note that the CondenseQuestionChatEngine does not directly provide the citation functionality of the CitationQueryEngine. The citation functionality is provided by the CitationQueryEngine itself. The CondenseQuestionChatEngine simply uses the CitationQueryEngine to generate responses.

For more details, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai can you give me example for value of condense_question_prompt as constructor of CondenseQuestionChatEngine
Let me check the knowledge base for you. One moment! :books:
----------------------
Sure, the condense_question_prompt is a template that instructs the model on how to condense the conversation history and the latest user message into a standalone question. Here is an example of how you can define a condense_question_prompt:

Plain Text
from llama_index.core import PromptTemplate

condense_question_prompt = PromptTemplate(
    """\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.

<Chat History>
{chat_history}

<Follow Up Message>
{question}

<Standalone question>
"""
)


In this template, {chat_history} will be replaced with the actual conversation history and {question} will be replaced with the latest user message. The model will then generate a standalone question based on this prompt.

You can then pass this condense_question_prompt to the CondenseQuestionChatEngine:

Plain Text
chat_engine = CondenseQuestionChatEngine.from_defaults(
    query_engine=query_engine,
    condense_question_prompt=condense_question_prompt,
    chat_history=custom_chat_history,
    verbose=True,
)


For more details, you can refer to the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
It should be like this

chat_engine = CondenseQuestionChatEngine(
query_engine=query_engine, # replace with your CitationQueryEngine instance
condense_question_prompt=condense_question_prompt,
memory=memory,
llm = Settings.llm,
verbose=True,
)
Add a reply
Sign up and join the conversation on Discord