----------------------
Sure, here's a brief overview of the response modes you asked about in LlamaIndex:
- Refine: This is an iterative way of generating a response. The context in the first node, along with the query, is used to generate an initial answer. This answer, the query, and the context of the second node are then used as input into a “refine prompt” to generate a refined answer. This process is repeated through N-1 nodes, where N is the total number of nodes. source ()
- Compact: This mode first combines text chunks into larger consolidated chunks that more fully utilize the available context window, then refines answers across them. This mode is faster than refine since fewer calls are made to the LLM. source ()
- Tree Summarize: This mode builds a tree index over the set of candidate nodes, with a summary prompt seeded with the query. The tree is built in a bottoms-up fashion, and in the end, the root node is returned as the response. source ()
- Accumulate: This mode synthesizes a response for each text chunk, and then returns the concatenation. source ()