@Logan M @WhiteFang_Jr Please help π₯Ί
have you started with prompt engineering this? Instruct the chatbot to do this via your system prompt
Yes, as part of system_prompt mentioned "Using the context and not any other prior knowledge, answer in detailed manner".
Still Chatbot is responding to general questions like 'Who created Harry Potter'.
Hi, Which chat mode are you using?
Tried context and condense_plus_context, same behaviour
Not able to provide system_prompt. Getting exception "system_prompt" is not supported for CondenseQuestionChatEngine
How can we provide system_prompt with condense chat mode? Also using condense chat mode is throwing following exception
@WhiteFang_Jr @Logan M Please help π₯Ί
@Jerry Liu Please help π₯Ί
This is how I am defining the chat engine
System prompt is indeed not supported
All condense question does is rewrites your query in context with the chat history, runs the query engine, and then returns whatever the query engine returns.
So either you can change the prompt that is rewriting the query, or change the query engine prompts
Its not quite a "chat" interface, so a system prompt doesnt make sense here
@Logan M Back to the original question then, On top of our documents, we are building RAG pipeline where we want to use chat engine for our ChatBot implementation so that we have memory of current chat session. How can we make the chat engine to only respond to queries in context on our documents and should not respond to any other general questions like 'Who created Harry Potter'.
Technically that is the default prompt in a query engine (i.e. only use the provided context to answer)
But not every LLM follows those instructions perfectly. So you might have to tweak that
The prompt to tweak depends on what chat engine you are using. For CondenseQuestionChatEngine, it would just be the query engine prompts
SMH π, unable to understand clearly. I am using gpt3.5, is there a sample code example which I can use and check your suggestion @Logan M ?
Hi @Logan M, I'm assisting Mike (Cool) here. So is there any form of the chat_engine that doesn't answer questions that are outside of the context? I have tried all of them, as well as 20+ different variations of prompts instructing the LLM not to answer any questions that are outside of the context it is given and no avail.
Prompt tuning is really the only way π
Thats the only way to control an LLM
@Logan M Query Engine works great for our requirement and not Chat Engine. Chat Engine is not honouring our system prompt, feels like issue with Chat Engine rather than the llm itself. Attaching the snippet.
Also while using CondenseQuestionChatEngine, we are facing consistent error, this feels like Bug @Logan M @WhiteFang_Jr @Jerry Liu . Could someone please help fixing this?
OK, let's tone down the pings a bit π
How do you know its not respecting the template? What did you use as the template?
Strange, we are getting consitent error while using CondenseQuestionChatEngine still. Let me check the llama_index version we are using
Yea, maybe try and use the latest and see if it works for you π
Hey @Logan M. I got this working with the CondenseQuestionChatEngine, but when I ask the LLM follow up questions, it just rewords the question I asked it before. For example when I ask it: "How do I start an EC2 instance?" it gives me an answer from the context provided in my vector store, but then I ask it "Who is harry potter?" and it just rewords it into something like "Can you tell me about starting an EC2 instance?" and returns a response based off that.
You can modify the prompt that re-words the question
DEFAULT_TEMPLATE = """\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.
<Chat History>
{chat_history}
<Follow Up Message>
{question}
<Standalone question>
"""
prompt = PromptTemplate(DEFAULT_TEMPLATE)
chat_engine = CondenseQuestionChatEngine.from_defaults(
query_engine, condense_question_prompt=prompt
)
Imo this chat engine is pretty janky (because it forces the query engine to run every time), not one I'd normally recommend π
An agent with return_direct tools, or the condense+context chat engine is usually my go-to recommendation
Awesome, thanks Logan I will do some more testing today. With condense+context you can't pass in prompt right? The whole purpose of using chat_engine over query_engine is simply for storing chat history so the LLM has the proper context at first and then you can ask it to elaborate on the answer it gives.
The prompt would be the system prompt in
condense+context
-- you can specify a similarish template that is used on every chat message
DEFAULT_CONTEXT_PROMPT_TEMPLATE = """
The following is a friendly conversation between a user and an AI assistant.
The assistant is talkative and provides lots of specific details from its context.
If the assistant does not know the answer to a question, it truthfully says it
does not know.
Here are the relevant documents for the context:
{context_str}
Instruction: Based on the above documents, provide a detailed answer for the user question below.
Answer "don't know" if not present in the document.
"""
DEFAULT_CONDENSE_PROMPT_TEMPLATE = """
Given the following conversation between a user and an AI assistant and a follow up question from user,
rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""
chat_engine = CondensePlusContextChatEngine.from_defaults(retriever, context_prompt=DEFAULT_CONTEXT_PROMPT_TEMPLATE, condense_prompt=DEFAULT_CONDENSE_PROMPT_TEMPLATE)
In this implementation, where would you pass in your VectorStoreIndex?
So, this only uses the retriever, so
retriever = index.as_retriever(similarity_top_k=2)
chat_engine = CondensePlusContextChatEngine.from_defaults(retriever, ...)
Got it.
I set index = VectorStoreIndex.from_vector_store then retriever = index.as_retriever, and the rest...
However, still getting answers that are outside of my context. I.e. LLM still answers the question "who is harry potter"
Do you know of any guardrails outside of prompt engineering to prevent it from using knowledge gpt3.5 is trained on?
Its really only prompt engineering π
or fine-tuning I guess, if you wanted to go that route