Prompt issues

At a glance

The post discusses an issue where the output of a refine prompt sometimes ends up being "The new context does not provide any additional information that would require a refinement of the original answer. The original answer remains accurate and complete." instead of providing the original response. Community members suggest that this may be due to the recent downgrade of the GPT-3.5 model by OpenAI. A community member shares a new refine prompt template that may help address the issue. The discussion also covers topics like chunking, service context configuration, and setting up the refine prompt in the query. However, there is no explicitly marked answer in the post or comments.

sshere

hi guys when using any refine prompt sometime the output ends up being "The new context does not provide any additional information that would require a refinement of the original answer. The original answer remains accurate and complete."
rather than providing the original response. Any idea whats happening here? its happening both on the Tree and The Simple vector
------------
Given the new context, refine the original answer to better answer the question. If the context isn't useful, output the original answer again.
DEBUG:llama_index.indices.response.response_builder:> Refined response: The new context does not provide any additional information that would require a refinement of the original answer. The original answer remains accurate and complete.

Refined response: The new context does not provide any additional information that would require a refinement of the original answer. The original answer remains accurate and complete.

INFO:llama_index.token_counter.token_counter:> [query] Total LLM token usage: 9426 tokens

'Tree Index':
{"query_str": user_question,
# "mode": "S",
"service_context": service_context,
"verbose": True
# "use_async": True
},
'Simple Vector Index':
{"query_str": user_question,
"mode": "default",
"response_mode": "tree_summarize",
"similarity_top_k": 5,
"service_context": service_context,
"verbose": True
# "use_async": True
},

21 comments

LLogan M

Yeaaaa are you using gpt-3.5? Openai seems to have downgraded the model recently, which is causing this problem.

When all the text doesn't fit into one llm call, it refines the answer across many llm calls

I've actually been working on a new refine template, if you want to test it. Let me grab the code

LLogan M

Plain Text

from langchain.prompts.chat import (
    AIMessagePromptTemplate,
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)

from llama_index.prompts.prompts import RefinePrompt

# Refine Prompt
CHAT_REFINE_PROMPT_TMPL_MSGS = [
    HumanMessagePromptTemplate.from_template("{query_str}"),
    AIMessagePromptTemplate.from_template("{existing_answer}"),
    HumanMessagePromptTemplate.from_template(
        "I have more context below which can be used "
        "(only if needed) to update your previous answer.\n"
        "------------\n"
        "{context_msg}\n"
        "------------\n"
        "Given the new context, update the previous answer to better "
        "answer my previous query."
        "If the previous answer remains the same, repeat it verbatim. "
        "Never reference the new context or my previous query directly.",
    ),
]


CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
...
index.query("my query", similarity_top_k=3, refine_template=CHAT_REFINE_PROMPT)

Just need to set the refine_template in the query kwargs you shared to use it in a graph

sshere

yes i am! oh that would be great i was going crazy trying to figure it out. I thought it was chunking, does chunking happen automatically or is it better to chunk the data during index creation?

LLogan M

Chunking happens during index construction yea. But the problem isn't entirely related to that 😅 just the llm not following instructions

LLogan M

Hopefully the above code helps. Feel free to try and tune it more too haha

sshere

right now on i'm doing chunk_limit = 1000 what do u generally use for a create_refine ?

LLogan M

Yea that chunk size is fine 💪especially for embeddings, that seems to be about the sweet spot

sshere

and then for service context i have the below, anything jump out as incorrect?
max_input_size = 3000
num_output = 1000
max_chunk_overlap = 20
chunk_size_limit = 1024
# embed_model = LangchainEmbedding(HuggingFaceEmbeddings())

prompt_helper = PromptHelper(
max_input_size=max_input_size
, num_output=num_output
, max_chunk_overlap=max_chunk_overlap
, chunk_size_limit=chunk_size_limit
)
llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo", request_timeout=1500))

return ServiceContext.from_defaults(
llm_predictor=llm_predictor
, prompt_helper=prompt_helper

# ,embed_model=embed_model
)

LLogan M

You can probably change the max input size back to 4096, unless you lowered it for a specific reason.

Also maybe set the chunk_size_limit in the service_context itself, in addition to the prompt helper.

Chunking happens twice, when the index is created, and during queries. Usually you'll want it to be the same size in both

sshere

also - i don't think its picking it up the new chat tempalte

sshere

'Tree Index':
{"query_str": user_question,
# "mode": "S",
"service_context": service_context,
"verbose": True,
"refine_template": "CHAT_REFINE_PROMPT",
"use_async": True
},

sshere

AttributeError: 'str' object has no attribute 'partial_format'

LLogan M

Hmm, did you copy every line I sent? There's kinda 3 steps/variables, the initial list of messages, the langchain prompt, and then the final llama index refine prompt that gets used in the kwargs

sshere

from langchain.prompts.chat import (
AIMessagePromptTemplate,
ChatPromptTemplate,
HumanMessagePromptTemplate,
)

from llama_index.prompts.prompts import RefinePrompt

Refine Prompt

CHAT_REFINE_PROMPT_TMPL_MSGS = [
HumanMessagePromptTemplate.from_template("{query_str}"),
AIMessagePromptTemplate.from_template("{existing_answer}"),
HumanMessagePromptTemplate.from_template(
"I have more context below which can be used "
"(only if needed) to update your previous answer.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context, update the previous answer to better "
"answer my previous query."
"If the previous answer remains the same, repeat it verbatim. "
"Never reference the new context or my previous query directly.",
),
]

CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)

LLogan M

Oh!

LLogan M

You put quotes around the variable name in the kwargs

LLogan M

"CHAT_REFINE_PROMPT"

just remove the quotes there

sshere

ah stupid me hahaha

sshere

with this change though tree index is far more powerful

sshere

with that in mind

how can i set the summary text here, right now its just the first couple hundred characters it seems:

Provide choice in the following format: 'ANSWER: <number>' and explain why this summary was selected in relation to the question.

>[Level 0] current prompt template: Some choices are given below. It is provided in a numbered list (1 to 4),where each item in the list corresponds to a summary.

LLogan M

Glad it helped! Maybe I should add that template in a PR lol a few people have found it to be better

So in a tree index, it builds a hierarchy of summaries automatically, so no way to set it specifically 🤔 then during the query, it kind of traverses the tree, which is what you are seeing there

Add a reply

Find answers from the community

Prompt issues

Refine Prompt