I have been experimenting with searching (fake) enterprise data and generating answers with the search results and I find that the REFINE method is very slow. SIMPLE_SUMMARIZE takes <7s while REFINE takes ~40s. Where can I find more information on query performance and ways to improve on SIMPLE_SUMMARIZE that are not doing ~6 sequential calls to the LLM? My index is a vector store, but it doesn't have to be...
qa_prompt = QuestionAnswerPrompt(prompt_prefix_template.format(
context_str="{context_str}",
query_str="{query_str}"))
refine_template_string = CHAT_REFINE_PROMPT_TMPL.format(
context_msg="{context_msg}",
query_str="{query_str}",
existing_answer="{existing_answer}")
my_refine_prompt = RefinePrompt(refine_template_string)
query_engine_refine = index.as_query_engine(text_qa_template=qa_prompt, refine_template=my_refine_prompt, response_mode=ResponseMode.REFINE, similarity_top_k=6)
query_engine_simple = index.as_query_engine(text_qa_template=qa_prompt, response_mode=ResponseMode.SIMPLE_SUMMARIZE, similarity_top_k=4)