It's using the "DEFAULT_TEXT_QA_PROMPT_TMPL" template
0.74 is a common "base" similarity for openai embeddings, at least from my experience
You can try setting similarity_cutoff to something like 0.78, or it will have to be a prompt engineering problem
In my experience, if you are using gpt 3.5, it's not great at following all the instructions lol
I see ;D. What model would you suggest for this?
I think text-davinci-003 (the default) strikes a good balance between cost and ability to follow instructions. But let me know what you find, it might still take some prompt engineering.
In my case, I am getting data in response from Node which doesnt have any similarity score. Is it possible π―
What kind of index do you have?
@Logan M I am having ComposableGraph built on GPTVectorStoreIndex
and GPTSimpleKeywordTableIndex
I think that score is caused by the graph. Something to do with the summaries maybe? Not unexpected, I've seen that before π
@Logan M Could you help me with the prompt engineering? =]
I'm trying to get a response with a similar format to the original document.
The best prompt I got so far is the below, but it isn't quite there yet:
QA_PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information and not prior knowledge, "
"answer the following question following a similar format to the context."
"Add (un)ordered lists if applicable. If you're unsure of the answer, say \"Sorry, I don't know\": {query_str}\n"
)
I want the response to have a similar format to the context
Have you also set the refine template too?
Trying to find an example in the documentation...
I can give you an example. Are you using gpt3.5/4 or Davinci? (It's slightly different depending)
Davinci.
Appreciate your help!
Which might be why you are struggling with the output formatting (since this prompt doesn't have your instructions yet)
I have changed that template, though:
QA_PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information and not prior knowledge, "
"answer the following question following a similar format to the context."
"Add (un)ordered lists if applicable. If you're unsure of the answer, say \"Sorry, I don't know\": {query_str}\n"
)
QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)
# configure response synthesizer
response_synthesizer = ResponseSynthesizer.from_args(
node_postprocessors=[
SimilarityPostprocessor(similarity_cutoff=0.78)
],
text_qa_template=QA_PROMPT
)
What am I missing =]
that's only the QA template. Theres two -> text_qa_template and refine_template
That link above links to the default refine template π
DEFAULT_REFINE_PROMPT_TMPL = (
"The original question is as follows: {query_str}\n"
"We have provided an existing answer: {existing_answer}\n"
"We have the opportunity to refine the existing answer "
"(only if needed) with some more context below.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context, refine the original answer to better "
"answer the question. "
"If the context isn't useful, return the original answer."
)
DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information and not prior knowledge, "
"answer the question: {query_str}\n"
)
DEFAULT_TEXT_QA_PROMPT = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)
# configure response synthesizer
response_synthesizer = ResponseSynthesizer.from_args(
node_postprocessors=[
SimilarityPostprocessor(similarity_cutoff=0.78)
],
text_qa_template=DEFAULT_TEXT_QA_PROMPT
refine_template=DEFAULT_REFINE_PROMPT
)
That's an example of setting both
Hmm, got it! I'll play with that.
Thank you!
I wonder why it's cutting off the response on the last item.
I added this item to the QA template and refine prompt: "Keep the same context formatting and add ordered lists if it makes sense."
It might be reaching the max output? By default, openai will output 256 tokens
How do I work out the max output number, given that my documents have different sizes?
I guess I can just tell the model not to end the response out of nowhere ;D
If it was 256 (or near that), it didn't stop it response out of nowhere, openai just stopped it from finishing its sentence because it reached the max output tokens π
I set num_output to 700 and my response has 252 tokens (886 characters) and it's still getting cut off
If I add a "and don't stop the response out of nowhere" to the QA template, it will end the response correctly and will add more characters to it...
Sorry, I'm probably doing something dumb
Try something like this
# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 512
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
# define LLM
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_output))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
It will still stop the final sentence before ending it. It does work fine if I remove the "Treat this as a knowledge base article.\n" from the QA template, but it will display it as single paragraph. I'd the response format to look like a proper guide, though
Happy to share the guide with you if you wanna take a look. This is in a Google Doc.
Hmm I'm more curious what your current setup is now lol (service context, loading index, query)
service_context = None
def construct_index():
GoogleDriveReader = download_loader('GoogleDriveReader')
loader = GoogleDriveReader()
documents = loader.load_data(folder_id=folder_id)
# define prompt helper
# set maximum input size
max_input_size = 4096
# set number of output tokens
num_output = 512
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
# define LLM
#text-davinci-003
#text-ada-001
llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-003", max_tokens=num_output))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
#builds an index over the Google docs
#index = GPTVectorStoreIndex.from_documents(documents)
index = GPTVectorStoreIndex.from_documents(
documents, service_context=service_context
)
#persists the index to disk (by default to ./storage) so that it can be used later
index.storage_context.persist()
def ask_v():
# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./storage")
# load index
index = load_index_from_storage(storage_context, service_context=service_context)
retriever = VectorIndexRetriever(index=index, similarity_top_k=1)
DEFAULT_TEXT_QA_PROMPT_TMPL = (
"Context information is below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given the context information and not prior knowledge, "
"answer the question: {query_str}\nTreat this as a knowledge base article and don't end it out of nowhere.\n"
)
QA_PROMPT = QuestionAnswerPrompt(DEFAULT_TEXT_QA_PROMPT_TMPL)
DEFAULT_REFINE_PROMPT_TMPL = (
"The original question is as follows: {query_str}\n"
"We have provided an existing answer: {existing_answer}\n"
"We have the opportunity to refine the existing answer "
"(only if needed) with some more context below.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context, refine the original answer to better "
"answer the question. Treat this as a knowledge base article."
"If the context isn't useful, return the original answer."
)
DEFAULT_REFINE_PROMPT = RefinePrompt(DEFAULT_REFINE_PROMPT_TMPL)
response_synthesizer = ResponseSynthesizer.from_args(
node_postprocessors=[
SimilarityPostprocessor(similarity_cutoff=0.78)
],
text_qa_template=QA_PROMPT,
refine_template=DEFAULT_REFINE_PROMPT
)
# assemble query engine
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
# query_engine = index.as_query_engine(similarity_top_k=1, retriever_mode="embedding") #return data from 1 node only
response = query_engine.query("How do I install the company browser?")
In the query engine constructor, maybe add the service content there as well
That seems to have done the trick! Why is that needed?
Aqesome! It has to inherit the service context from the index, otherwise the settings are set back to default
Normally this gets set automatically with as_query_engine()
, but since you aren't using that, gotta do it manually πͺ
I see. Thanks so much for your help π