REFINE
mode of LlamaIndex's response synthesizers. In REFINE
mode, the system generates a response in an iterative manner. It first uses the context in the first node, along with the query, to generate an initial answer. This answer, the query, and the context of the second node are then passed as input into a “refine prompt” to generate a refined answer. This process is repeated through N-1 nodes, where N is the total number of nodes. REFINE
mode to iteratively improve the response.COMPACT
mode in LlamaIndex's response synthesizers is similar to the REFINE
mode, but it first combines text chunks into larger consolidated chunks that more fully utilize the available context window, then refines answers across them. This mode is faster than REFINE
since it makes fewer calls to the LLM.COMPACT
mode, the system stuffs as many text (concatenated/packed from the retrieved chunks) that can fit within the context window. If the text is too long to fit in one prompt, it is split in as many parts as needed. Each text part is considered a "chunk" and is sent to the REFINE
synthesizer. So, it is like REFINE
, but with less LLM calls.COMPACT
mode to iteratively improve the response.