LlamaIndex

Log inLog into community

Find answers from the community

Updated 2 years ago

I m prompting a private KB and I asked a

I m prompting a private KB and I asked a

At a glance

·

I'm prompting a private KB and I asked a question that doesn't have the answer in the KB. The query still found 1 node with 0.74 similarity and then made up a response which has nothing to do with the question. How do I prevent this from happening? I'm using GPTVectorStoreIndex to generate the index and my query engine is index.as_query_engine(similarity_top_k=1, retriever_mode="embedding"). Model name is "text-ada-001"

b

L

a

44 comments

It's using the "DEFAULT_TEXT_QA_PROMPT_TMPL" template

0.74 is a common "base" similarity for openai embeddings, at least from my experience

You can try setting similarity_cutoff to something like 0.78, or it will have to be a prompt engineering problem

In my experience, if you are using gpt 3.5, it's not great at following all the instructions lol

I see ;D. What model would you suggest for this?

I think text-davinci-003 (the default) strikes a good balance between cost and ability to follow instructions. But let me know what you find, it might still take some prompt engineering.

In my case, I am getting data in response from Node which doesnt have any similarity score. Is it possible 😯

.0

What kind of index do you have?

@Logan M I am having ComposableGraph built on GPTVectorStoreIndex

and GPTSimpleKeywordTableIndex

I think that score is caused by the graph. Something to do with the summaries maybe? Not unexpected, I've seen that before 😅

@Logan M Could you help me with the prompt engineering? =]
I'm trying to get a response with a similar format to the original document.

The best prompt I got so far is the below, but it isn't quite there yet:

Plain Text

QA_PROMPT_TMPL = (
        "Context information is below. \n"
        "---------------------\n"
        "{context_str}"
        "\n---------------------\n"
        "Given the context information and not prior knowledge, "
        "answer the following question following a similar format to the context."
        "Add (un)ordered lists if applicable. If you're unsure of the answer, say \"Sorry, I don't know\": {query_str}\n"
    )

I want the response to have a similar format to the context

Have you also set the refine template too?

I didn't =X

Trying to find an example in the documentation...

I can give you an example. Are you using gpt3.5/4 or Davinci? (It's slightly different depending)

Davinci.
Appreciate your help!

Cool!

So it's pretty easy for davinci. The internal default prompt template is here, and it's a good example: https://github.com/jerryjliu/llama_index/blob/main/llama_index/prompts/default_prompts.py#L90

Basically, this gets used whenever all the retrieved nodes do not fit into a single LLM call. So it gets an initial answers, then tries to refine it across the retrieved text

Which might be why you are struggling with the output formatting (since this prompt doesn't have your instructions yet)

I have changed that template, though:

Plain Text

    QA_PROMPT_TMPL = (
        "Context information is below. \n"
        "---------------------\n"
        "{context_str}"
        "\n---------------------\n"
        "Given the context information and not prior knowledge, "
        "answer the following question following a similar format to the context."
        "Add (un)ordered lists if applicable. If you're unsure of the answer, say \"Sorry, I don't know\": {query_str}\n"
    )

    QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)

    # configure response synthesizer
    response_synthesizer = ResponseSynthesizer.from_args(
        node_postprocessors=[
            SimilarityPostprocessor(similarity_cutoff=0.78)
        ],
        text_qa_template=QA_PROMPT
    )

What am I missing =]

that's only the QA template. Theres two -> text_qa_template and refine_template

That link above links to the default refine template 🙂

Plain Text

DEFAULT_REFINE_PROMPT_TMPL = (
    "The original question is as follows: {query_str}\n"
    "We have provided an existing answer: {existing_answer}\n"
    "We have the opportunity to refine the existing answer "
    "(only if needed) with some more context below.\n"
    "------------\n"
    "{context_msg}\n"
    "------------\n"
    "Given the new context, refine the original answer to better "
    "answer the question. "
    "If the context isn't useful, return the original answer."
)
DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)


DEFAULT_TEXT_QA_PROMPT_TMPL = (
    "Context information is below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given the context information and not prior knowledge, "
    "answer the question: {query_str}\n"
)
DEFAULT_TEXT_QA_PROMPT = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)

# configure response synthesizer
    response_synthesizer = ResponseSynthesizer.from_args(
        node_postprocessors=[
            SimilarityPostprocessor(similarity_cutoff=0.78)
        ],
        text_qa_template=DEFAULT_TEXT_QA_PROMPT 
        refine_template=DEFAULT_REFINE_PROMPT 
    )

That's an example of setting both

Hmm, got it! I'll play with that.
Thank you!

I wonder why it's cutting off the response on the last item.

I added this item to the QA template and refine prompt: "Keep the same context formatting and add ordered lists if it makes sense."

Attachment

It might be reaching the max output? By default, openai will output 256 tokens

How do I work out the max output number, given that my documents have different sizes?

I guess I can just tell the model not to end the response out of nowhere ;D

Can you check how long that response was?

https://platform.openai.com/tokenizer

If it was 256 (or near that), it didn't stop it response out of nowhere, openai just stopped it from finishing its sentence because it reached the max output tokens 👀

I set num_output to 700 and my response has 252 tokens (886 characters) and it's still getting cut off

If I add a "and don't stop the response out of nowhere" to the QA template, it will end the response correctly and will add more characters to it...

Odd...

Attachment

Sorry, I'm probably doing something dumb

Try something like this

Plain Text

# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 512
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_output))

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

It will still stop the final sentence before ending it. It does work fine if I remove the "Treat this as a knowledge base article.\n" from the QA template, but it will display it as single paragraph. I'd the response format to look like a proper guide, though

Happy to share the guide with you if you wanna take a look. This is in a Google Doc.

Hmm I'm more curious what your current setup is now lol (service context, loading index, query)

Plain Text

service_context = None

def construct_index():
    GoogleDriveReader = download_loader('GoogleDriveReader')
    loader = GoogleDriveReader()
    documents = loader.load_data(folder_id=folder_id)

    # define prompt helper

    # set maximum input size
    max_input_size = 4096
    # set number of output tokens
    num_output = 512
    # set maximum chunk overlap
    max_chunk_overlap = 20
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

    # define LLM
    #text-davinci-003
    #text-ada-001
    llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_output))

    service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

    #builds an index over the Google docs
    #index = GPTVectorStoreIndex.from_documents(documents)
    index = GPTVectorStoreIndex.from_documents(
        documents, service_context=service_context
    )

    #persists the index to disk (by default to ./storage) so that it can be used later
    index.storage_context.persist()

Plain Text

def ask_v():
    # rebuild storage context
    storage_context = StorageContext.from_defaults(persist_dir="./storage")

    # load index
    index = load_index_from_storage(storage_context, service_context=service_context)

    retriever = VectorIndexRetriever(index=index, similarity_top_k=1)

    DEFAULT_TEXT_QA_PROMPT_TMPL = (
        "Context information is below. \n"
        "---------------------\n"
        "{context_str}"
        "\n---------------------\n"
        "Given the context information and not prior knowledge, "
        "answer the question: {query_str}\nTreat this as a knowledge base article and don't end it out of nowhere.\n"
    )

    QA_PROMPT = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)

    DEFAULT_REFINE_PROMPT_TMPL = (
        "The original question is as follows: {query_str}\n"
        "We have provided an existing answer: {existing_answer}\n"
        "We have the opportunity to refine the existing answer "
        "(only if needed) with some more context below.\n"
        "------------\n"
        "{context_msg}\n"
        "------------\n"
        "Given the new context, refine the original answer to better "
        "answer the question. Treat this as a knowledge base article."
        "If the context isn't useful, return the original answer."
    )

    DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)

    response_synthesizer = ResponseSynthesizer.from_args(
        node_postprocessors=[
            SimilarityPostprocessor(similarity_cutoff=0.78)
        ],
        text_qa_template=QA_PROMPT,
        refine_template=DEFAULT_REFINE_PROMPT 
    )

    # assemble query engine
    query_engine = RetrieverQueryEngine(
        retriever=retriever,
        response_synthesizer=response_synthesizer,
    )

    # query_engine = index.as_query_engine(similarity_top_k=1, retriever_mode="embedding") #return data from 1 node only

    response = query_engine.query("How do I install the company browser?")

In the query engine constructor, maybe add the service content there as well

That seems to have done the trick! Why is that needed?

Aqesome! It has to inherit the service context from the index, otherwise the settings are set back to default

Normally this gets set automatically with as_query_engine(), but since you aren't using that, gotta do it manually 💪

I see. Thanks so much for your help 🙂

Add a reply

Sign up and join the conversation on Discord