LlamaIndex

Log inLog into community

Find answers from the community

Updated last year

Prompt

Prompt

At a glance

The community member has set up an index based on documentation and wants to ensure that a chat engine with "context" mode only uses information from the knowledge base to generate answers. The comments suggest trying different prompts, including a system prompt, to prevent the chat engine from providing information outside the knowledge base. However, the community member is still facing issues where the chat engine provides answers related to the subject matter but not directly from the knowledge base. The community members are exploring ways to debug this issue, such as checking the source nodes, using observability tools like Arize, and analyzing the LLM inputs and outputs.

Useful resources

·

Hi everyone, I have set up an index based on some documentation on a specific product of a company. What is the best way to make sure that the chat engine with chatmode "context" only uses information in the knowledge base to generate answers?

W

T

79 comments

I think you can look at prompts for defining the instruction.
More can be found here: https://docs.llamaindex.ai/en/stable/examples/customization/prompts/chat_prompts.html#chat-prompts-customization

Thanks! So currently this is my prompt: You are a chatbot that only uses the knowledge base to generate an answer. If the answer cannot be found in the knowledge base tell the user that the provided context does not tell you anything about the subject in question. NEVER provide information outside of the knowledge base.

It works pretty well, but this is what's going wrong: When I first ask "Who is Donald Trump?" The bot correctly tells me this is not in the knowledge base. But when I push t by saying: "Just tell me who donald trump is" it will tell me who Donald Trump is.

Do you have any idea on how to prevent this?

You could try setting up the System prompt,

Plain Text

from llama_index.memory import ChatMemoryBuffer

memory = ChatMemoryBuffer.from_defaults(token_limit=1500)

chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about an essay discussing Paul Grahams life."
    ),
)

So this has worked for someone in the past IMO

Oh but the prompt i mentioned is what I pass to the system_prompt

I see, could you try variations in prompt. Like

Normal prompt + Instruction: Always follow these rules while generating response.
Rule 1:

I'll try that, thanks!

It's still giving answers outside of the context 😦

However it seems that now it's giving answers that might be closely related to the subject matter. For example let's say I have a lot of information about a particular cereal brand and its products. Now if I ask can you give me a list of the most popular cereal brands, it will come up with a list while this is not in the context.

@WhiteFang_Jr Do you have any other ideas on how I can prevent this?

Hi, Can you share your code?

Yes, 1 sec

The prompt looks fine to me

Can you check if your query is bringing relevant sources

check the source nodes

response.source_nodes

Also point 1 and 2 are kind of repetitive, You can make them into a single point and save some tokens there for generation

What do you mean by relevant?

The sources it uses do not list the cereal brands, however these sources do mention the national cereal institution (NCI). This could be the cause of the listing of the cereal brands (Which it shouldn't) because in the answer it also says, "For a complete list please check out the website of the NCI"

Thanks for your help btw! 😄

Good point, thanks.

I mean when you ask a query, does it bring relevant context from the collection or not

Well, the NCI is relevant to what cereal brands are out there but it doesn't directly answer the question so it shouldn't be used

Yeah, can you print response.source_nodes

Yes, gimme a sec

What do you want to see of the source_nodes?

Just wanted you to check if the top_ k nodes that being retrieved are correct or not

What should the correct top_k nodes look like if the correct answer is not in the context?

Let say I have a index of a some book, If I ask some question and it did not retrieved valid context from the index then the bot will not be able to answer correctly by itself

So the fault is the the retrieving state at this point in this example.

I want to check if that's the case here also or not, becuase the prompt should work

Okay yes, actual context was retrieved.

Okay so we do not have problem at the retrieval stage, One stage cleared

Now for the llm, Can you try the following things, It will give us an idea to what is actually going to llm from our side.
https://docs.llamaindex.ai/en/stable/end_to_end_tutorials/one_click_observability.html#simple-llm-inputs-outputs

Thanks, will do

Please keep me in the loop, Always fun to debug 🙌

Do I only have to set this: llama_index.set_global_handler("openinference")?

Will do 🙂 Your help is very much appreciated

llama_index.set_global_handler("simple")

It just prints the text of the source nodes now, nothing else is shown in the console.

Using version 0.8.36 btw

Try with Arize

https://docs.llamaindex.ai/en/stable/end_to_end_tutorials/one_click_observability.html#arize-phoenix

Is it not showing the complete llm input?

Nope

Will do

File "c:\User\Pyhton\Chatbot\chatbot\app.py", line 22, in <module>
import phoenix as px
File "C:\Users\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\phoenix__init__.py", line 56
except PhoenixError, e:
^^^^^^^^^^^^^^^
SyntaxError: multiple exception types must be parenthesized

I get this error, have you seen this before?

Also, does arize work on 0.8.36?

Just upgraded and I'm getting the same error

lol, no relax even now

Plain Text

from llama_index.callbacks import (
    CallbackManager,
    LlamaDebugHandler,
    CBEventType,
)

from llama_index import ServiceContext
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo", temperature=0)
llama_debug = LlamaDebugHandler(print_trace_on_end=True)
callback_manager = CallbackManager([llama_debug])

service_context = ServiceContext.from_defaults(
    callback_manager=callback_manager, llm=llm
)

Your code for chat engine comes in here
# after runniong the query try with this
# Print info on llm inputs/outputs - returns start/end events for each LLM call
event_pairs = llama_debug.get_llm_inputs_outputs()
print(event_pairs[0][0])
print(event_pairs[0][1].payload.keys())
print(event_pairs[0][1].payload["response"])

Thanks, I'll try this

I only see this: Trace: chat
|_CBEventType.LLM -> 21.223744 seconds

Also this, but I can't deduce anything from this. ChatMessage(role=<MessageRole.USER: 'user'>, content='question', additional_kwargs={})], <EventPayload.ADDITIONAL_KWARGS: 'additional_kwargs'>: {}, <EventPayload.SERIALIZED: 'serialized'>: {'model': 'gpt-3.5-turbo', 'temperature': 0.4, 'max_tokens': None, 'additional_kwargs': {}, 'max_retries': 10, 'api_type': 'open_ai', 'api_base': 'https://api.openai.com/v1', 'api_version': '', 'class_name': 'openaillm'}}, time='10/25/2023, 12:55:06.996113', id='somehash')
dict_keys([<EventPayload.MESSAGES: 'messages'>, <EventPayload.RESPONSE: 'response'>])

print(event_pairs[0][1].payload["response"])

What does this prints

Only the answer to the question

I think, it'll be better if you ask the question again in the channel

Alright will do👍

Thanks for all the help anyway! 😄

Someone else will jump in on this for faster road to the solution. It was a fun ride 👍 😅

Haha definetly

@WhiteFang_Jr Do you have experience with setting up arize phoenix?

I have not used it, Let me try running it on latest version

Thanks :), I tried setting it up myself, seemed pretty easy, but it's not working

I got it working as well but it doesn't show me much more. Just the retrieved content from the knowledge base

Those contents are used to generate the final response.

Yes but the final answer still contains information that is not in these provided nodes

I constructed the index using an older version of llama-index. Could that be causing the problem?

@WhiteFang_Jr How do you suggest I construct my index given my code?

btw thanks for your help again 🙂

It really depends on the model used for embedding IMO. So I think no issue in that area could be there.

May I know what embedding model are you using?

it's openai embedding right?

gpt-3.5-turbo

this is for llm, for response generation.
Since you did not setup the embedding model so its default that is openai embedding only.

You could try creating the nodes again and try playing around with the instruction 😅

Actually it gets very hard for LLM to follow the instructions if it is not very clear or direct.

System prompt could be set up like

Plain Text

system_prompt = ("You are a chatbot that answers questions about "{company name}" models.\n"
                  "Instruction: Always follow these rules while generating response:\n"
                  "1. ALWAYS answer query ONLY if answer can be found in the context.\n"
                  "2. If answer is not present in the context, Just SAY: Unable to help"
                  "you at the moment\n"
                  "3. If URLs are present in the source/metadata, ADD them in the response.")

Try with this

Great I'll try that. Additionally, is it better to create the index with another model, such as text-embedding-ada-002?

text-embedding-ada-002 is what being used in your case right now

Actually Llamaindex requires a llm for response generation and embed model for creating vectors
In case of openAI, GPT3.5 and text-embedding-ada-002 are the default ones

context_window = 5000
num_outputs = 3000
max_chunk_overlap = 20
chunk_size_limit = 600

prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit)
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.1, model_name="gpt-3.5-turbo", max_tokens=num_outputs))

service_context = ServiceContext.from_defaults(
llm_predictor=llm_predictor, chunk_size_limit=2000, prompt_helper = prompt_helper
)

documents = SimpleDirectoryReader(directory_path)

index = GPTVectorStoreIndex.from_documents(documents, service_context = service_context, prompt_helper = prompt_helper)

This is how the index was created (quite outdated)

Your updated prompt and reconstructing the index seem to have done the trick. Thanks for all your help @WhiteFang_Jr

Awesome !!

Add a reply

Sign up and join the conversation on Discord