I have been experimenting with searching

At a glance

I have been experimenting with searching (fake) enterprise data and generating answers with the search results and I find that the REFINE method is very slow. SIMPLE_SUMMARIZE takes <7s while REFINE takes ~40s. Where can I find more information on query performance and ways to improve on SIMPLE_SUMMARIZE that are not doing ~6 sequential calls to the LLM? My index is a vector store, but it doesn't have to be...

Plain Text

qa_prompt = QuestionAnswerPrompt(prompt_prefix_template.format(
    context_str="{context_str}",
    query_str="{query_str}"))

refine_template_string = CHAT_REFINE_PROMPT_TMPL.format(
    context_msg="{context_msg}",
    query_str="{query_str}",
    existing_answer="{existing_answer}")
my_refine_prompt = RefinePrompt(refine_template_string)

query_engine_refine = index.as_query_engine(text_qa_template=qa_prompt, refine_template=my_refine_prompt, response_mode=ResponseMode.REFINE, similarity_top_k=6)
query_engine_simple = index.as_query_engine(text_qa_template=qa_prompt, response_mode=ResponseMode.SIMPLE_SUMMARIZE, similarity_top_k=4)

4 comments

LLogan M

So, setting the REFINE mode will always make 1 call per top k node (so yea, 6 in this case)

An easy way to speed this up is to use COMPACT (the default mode) which will reduce the number of calls, especially if you adjust the chunk size

service_context = ServiceContext.from_defaults(..., chunk_size=512) will likely reduce down to a single LLM call with COMPACT and a top k of 6

sskittythecat

Seems like the doc is out of date https://gpt-index.readthedocs.io/en/latest/how_to/query_engine/response_modes.html

sskittythecat

but the api ref is fine. https://gpt-index.readthedocs.io/en/latest/reference/query/response_synthesizer.html#llama_index.indices.response.type.ResponseMode.REFINE

Obviously I should not be using simple_summarize, since compact is better

LLogan M

yea, thanks for pointing that out. Will make a point to update this

Add a reply

Find answers from the community

I have been experimenting with searching