Condensation + System prompt

At a glance

The community member is building a chat engine that should: 1) Call an LLM to condense the previous chat history and current question into a new question, 2) query the index with the condensed question to get the results, and then 3) Call an LLM with a system prompt, the condensed question, and the query results. The community member is asking how to do this with llama_index and whether any of the chat modes support both the condensing and the system prompt.

In the comments, a community member suggests looking at an example from the AWS Samples repository that uses Amazon Kendra and LangChain. Another community member notes that in llama_index, the CondenseQuestionChatEngine doesn't allow for a system prompt, and the ContextChatEngine has no provision for condensing questions.

The Kendra example shared by a community member shows how to use ConversationalRetrievalChain from LangChain to condense the question and tune the final prompt to the LLM. However, a community member tried replacing the KendraRetriever with a LlamaIndexRetriever</

Useful resources

KKapilMalik

I am building a chat engine that should 1) Call llm to condense the previous chat history + current question into a new question, 2) query the index with condensed question to get the results, and then finally 3) Call llm with a system prompt, condensed question and query results. How can I do that with llama_index? Do any of the chat modes support both the condensing as well as a system prompt?

4 comments

KKapilMalik

Something on the lines of https://github.com/aws-samples/amazon-kendra-langchain-extensions/blob/main/kendra_retriever_samples/kendra_chat_open_ai.py ...

KKapilMalik

In llama_index, CondenseQuestionChatEngine doesn't allow for system prompt. And ContextChatEngine has no provision for condensing questions.

KKapilMalik

Kendra example that I shared allows for this -

Plain Text

retriever = AmazonKendraRetriever(index_id=kendra_index_id, region_name=region)

  prompt_template = """
  The following is a friendly conversation between a human and an AI. 
  The AI is talkative and provides lots of specific details from its context.
  If the AI does not know the answer to a question, it truthfully says it 
  does not know.
  {context}
  Instruction: Based on the above documents, provide a detailed answer for, {question} Answer "don't know" 
  if not present in the document. 
  Solution:"""
  PROMPT = PromptTemplate(
      template=prompt_template, input_variables=["context", "question"]
  )

  condense_qa_template = """
  Given the following conversation and a follow up question, rephrase the follow up question 
  to be a standalone question.

  Chat History:
  {chat_history}
  Follow Up Input: {question}
  Standalone question:"""
  standalone_question_prompt = PromptTemplate.from_template(condense_qa_template)

  qa = ConversationalRetrievalChain.from_llm(
        llm=llm, 
        retriever=retriever, 
        condense_question_prompt=standalone_question_prompt, 
        return_source_documents=True, 
        combine_docs_chain_kwargs={"prompt":PROMPT})
  return qa

This is great because it allows for condensing the question, and also tune the final prompt to LLM.

KKapilMalik

I tried replacing KendraRetriever with a LlamaIndexRetriever (from langchain), but it doesn't work. 🫣

Add a reply

Find answers from the community

Condensation + System prompt