Find answers from the community

Updated 10 months ago

RetrieverQueryEngine set response_mode as "tree_summarize" return top 10 node will have is

@kapa.ai RetrieverQueryEngine set response_mode as "tree_summarize" return top 10 node will have issue: token more than 4096
k
3 comments
Please stand by @mikeLiu as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The tree_summarize response mode in LlamaIndex's RetrieverQueryEngine is designed to handle a set of Node objects and the query, recursively construct a tree, and return the root node as the response. It's good for summarization purposes. However, if the total number of tokens from the top 10 nodes exceeds the maximum token limit (4096 tokens for most transformer models), you might encounter issues.

To handle this, LlamaIndex provides different response modes that you can use based on your specific needs:

  • default: This mode creates and refines an answer by sequentially going through each retrieved Node. It makes a separate LLM call per Node, which is good for more detailed answers.
  • compact: This mode compacts the prompt during each LLM call by stuffing as many Node text chunks that can fit within the maximum prompt size. If there are too many chunks to stuff in one prompt, it creates and refines an answer by going through multiple prompts.
  • simple_summarize: This mode truncates all text chunks to fit into a single LLM prompt. It's good for quick summarization purposes, but may lose detail due to truncation.
  • no_text: This mode only runs the retriever to fetch the nodes that would have been sent to the LLM, without actually sending them. Then they can be inspected by checking response.source_nodes.
  • accumulate: This mode applies the query to each Node text chunk while accumulating the responses into an array. It returns a concatenated string of all responses. It's good for when you need to run the same query separately against each text chunk.
  • compact_accumulate: This mode is the same as accumulate, but will “compact” each LLM prompt similar to compact, and run the same query against each text chunk.
You can set the response_mode when creating the RetrieverQueryEngine as shown in the code below:
Plain Text
query_engine = RetrieverQueryEngine.from_args(
    retriever=retriever,
    response_synthesizer=get_response_synthesizer(
        response_mode="your_choice_of_response_mode",
    ),
    node_postprocessors=[reranker],
)


You can find more details about these response modes in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord