Hello Experts,

Hello Experts,
We are using chat engine for our ChatBot implementation (on top of our documents RAG). We want the chat engine to only respond to queries in context on our documents and should not respond to any other general questions like 'Who created Harry Potter'. How can we enforce ChatBot to behave/respond just for our documents context?

40 comments

@Logan M @WhiteFang_Jr Please help 🥺

ddenen99

have you started with prompt engineering this? Instruct the chatbot to do this via your system prompt

Yes, as part of system_prompt mentioned "Using the context and not any other prior knowledge, answer in detailed manner".

Still Chatbot is responding to general questions like 'Who created Harry Potter'.

Hi, Which chat mode are you using?

Tried context and condense_plus_context, same behaviour

Yea so both allow general interaction as well: https://docs.llamaindex.ai/en/stable/examples/chat_engine/chat_engine_context/
Try using condense + good system prompt that bars the bot to simply deny any external communication if no context is found

Trying

Not able to provide system_prompt. Getting exception "system_prompt" is not supported for CondenseQuestionChatEngine

🥺

How can we provide system_prompt with condense chat mode? Also using condense chat mode is throwing following exception

Attachment

@WhiteFang_Jr @Logan M Please help 🥺

@Jerry Liu Please help 🥺

How are you defining your chat engine?
I can see that you can pass it as kwarg for system_prompt for CondenseQuestionChatEngine : https://github.com/run-llama/llama_index/blob/63a0d4fac912e5262d79ffc7a1c22225d2ec8407/llama-index-core/llama_index/core/indices/base.py#L451

Also if creating chat_engine directly:

Plain Text

from llama_index.core.chat_engine.condense_question import CondenseQuestionChatEngine
chat_engine = CondenseQuestionChatEngine.from_defaults(system_prompt="ADD_YOUR_SYSTEM_PROMPT",...)

There is system_prompt option there as well: https://github.com/run-llama/llama_index/blob/63a0d4fac912e5262d79ffc7a1c22225d2ec8407/llama-index-core/llama_index/core/chat_engine/condense_question.py#L81

Tried this as well, still facing the same exception: "system_prompt is not supported for CondenseQuestionChatEngine.". It is coming from here: https://github.com/run-llama/llama_index/blob/63a0d4fac912e5262d79ffc7a1c22225d2ec8407/llama-index-core/llama_index/core/chat_engine/condense_question.py#L96

This is how I am defining the chat engine

Attachment

System prompt is indeed not supported

All condense question does is rewrites your query in context with the chat history, runs the query engine, and then returns whatever the query engine returns.

So either you can change the prompt that is rewriting the query, or change the query engine prompts

Its not quite a "chat" interface, so a system prompt doesnt make sense here

@Logan M Back to the original question then, On top of our documents, we are building RAG pipeline where we want to use chat engine for our ChatBot implementation so that we have memory of current chat session. How can we make the chat engine to only respond to queries in context on our documents and should not respond to any other general questions like 'Who created Harry Potter'.

Technically that is the default prompt in a query engine (i.e. only use the provided context to answer)

But not every LLM follows those instructions perfectly. So you might have to tweak that

The prompt to tweak depends on what chat engine you are using. For CondenseQuestionChatEngine, it would just be the query engine prompts

SMH 🙄, unable to understand clearly. I am using gpt3.5, is there a sample code example which I can use and check your suggestion @Logan M ?

Hi @Logan M, I'm assisting Mike (Cool) here. So is there any form of the chat_engine that doesn't answer questions that are outside of the context? I have tried all of them, as well as 20+ different variations of prompts instructing the LLM not to answer any questions that are outside of the context it is given and no avail.

Prompt tuning is really the only way 😅 Thats the only way to control an LLM

@Logan M Query Engine works great for our requirement and not Chat Engine. Chat Engine is not honouring our system prompt, feels like issue with Chat Engine rather than the llm itself. Attaching the snippet.

Attachment

Also while using CondenseQuestionChatEngine, we are facing consistent error, this feels like Bug @Logan M @WhiteFang_Jr @Jerry Liu . Could someone please help fixing this?

Attachment

OK, let's tone down the pings a bit 😅

How do you know its not respecting the template? What did you use as the template?

I'm not having any issues with changing the prompt or streaming here
https://colab.research.google.com/drive/1fOFR6O5U3h-lfurjnnk2x8KxWIohCTqY?usp=sharing

Sorry Logan 🥺

Strange, we are getting consitent error while using CondenseQuestionChatEngine still. Let me check the llama_index version we are using

Yea, maybe try and use the latest and see if it works for you 👍

Hey @Logan M. I got this working with the CondenseQuestionChatEngine, but when I ask the LLM follow up questions, it just rewords the question I asked it before. For example when I ask it: "How do I start an EC2 instance?" it gives me an answer from the context provided in my vector store, but then I ask it "Who is harry potter?" and it just rewords it into something like "Can you tell me about starting an EC2 instance?" and returns a response based off that.

You can modify the prompt that re-words the question

Plain Text

DEFAULT_TEMPLATE = """\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.

<Chat History>
{chat_history}

<Follow Up Message>
{question}

<Standalone question>
"""

prompt = PromptTemplate(DEFAULT_TEMPLATE)

chat_engine = CondenseQuestionChatEngine.from_defaults(
  query_engine, condense_question_prompt=prompt
)

Imo this chat engine is pretty janky (because it forces the query engine to run every time), not one I'd normally recommend 😅 An agent with return_direct tools, or the condense+context chat engine is usually my go-to recommendation

Awesome, thanks Logan I will do some more testing today. With condense+context you can't pass in prompt right? The whole purpose of using chat_engine over query_engine is simply for storing chat history so the LLM has the proper context at first and then you can ask it to elaborate on the answer it gives.

The prompt would be the system prompt in condense+context -- you can specify a similarish template that is used on every chat message

Plain Text

DEFAULT_CONTEXT_PROMPT_TEMPLATE = """
  The following is a friendly conversation between a user and an AI assistant.
  The assistant is talkative and provides lots of specific details from its context.
  If the assistant does not know the answer to a question, it truthfully says it
  does not know.

  Here are the relevant documents for the context:

  {context_str}

  Instruction: Based on the above documents, provide a detailed answer for the user question below.
  Answer "don't know" if not present in the document.
  """

DEFAULT_CONDENSE_PROMPT_TEMPLATE = """
  Given the following conversation between a user and an AI assistant and a follow up question from user,
  rephrase the follow up question to be a standalone question.

  Chat History:
  {chat_history}
  Follow Up Input: {question}
  Standalone question:"""

chat_engine = CondensePlusContextChatEngine.from_defaults(retriever, context_prompt=DEFAULT_CONTEXT_PROMPT_TEMPLATE, condense_prompt=DEFAULT_CONDENSE_PROMPT_TEMPLATE)

In this implementation, where would you pass in your VectorStoreIndex?

So, this only uses the retriever, so

Plain Text

retriever = index.as_retriever(similarity_top_k=2)
chat_engine = CondensePlusContextChatEngine.from_defaults(retriever, ...)

Got it.

I set index = VectorStoreIndex.from_vector_store then retriever = index.as_retriever, and the rest...
However, still getting answers that are outside of my context. I.e. LLM still answers the question "who is harry potter"
Do you know of any guardrails outside of prompt engineering to prevent it from using knowledge gpt3.5 is trained on?

Its really only prompt engineering 😅