Find answers from the community

Updated 2 years ago

I have been experimenting with searching

I have been experimenting with searching (fake) enterprise data and generating answers with the search results and I find that the REFINE method is very slow. SIMPLE_SUMMARIZE takes <7s while REFINE takes ~40s. Where can I find more information on query performance and ways to improve on SIMPLE_SUMMARIZE that are not doing ~6 sequential calls to the LLM? My index is a vector store, but it doesn't have to be...

Plain Text
qa_prompt = QuestionAnswerPrompt(prompt_prefix_template.format(
    context_str="{context_str}",
    query_str="{query_str}"))

refine_template_string = CHAT_REFINE_PROMPT_TMPL.format(
    context_msg="{context_msg}",
    query_str="{query_str}",
    existing_answer="{existing_answer}")
my_refine_prompt = RefinePrompt(refine_template_string)

query_engine_refine = index.as_query_engine(text_qa_template=qa_prompt, refine_template=my_refine_prompt, response_mode=ResponseMode.REFINE, similarity_top_k=6)
query_engine_simple = index.as_query_engine(text_qa_template=qa_prompt, response_mode=ResponseMode.SIMPLE_SUMMARIZE, similarity_top_k=4)
L
s
4 comments
So, setting the REFINE mode will always make 1 call per top k node (so yea, 6 in this case)

An easy way to speed this up is to use COMPACT (the default mode) which will reduce the number of calls, especially if you adjust the chunk size

service_context = ServiceContext.from_defaults(..., chunk_size=512) will likely reduce down to a single LLM call with COMPACT and a top k of 6
yea, thanks for pointing that out. Will make a point to update this
Add a reply
Sign up and join the conversation on Discord