Find answers from the community

Updated 9 months ago

why my model is generating two time i mean it refining the previous response

At a glance

The community member @kapa.ai is observing that their model is generating a response and then refining it. The comments explain that this behavior is due to the REFINE mode of LlamaIndex's response synthesizers, where the system generates an initial answer and then iteratively refines it using the context of subsequent nodes. When the community member mentions using the COMPACT mode, the comments further explain that this mode is similar to REFINE, but it first combines text chunks into larger consolidated chunks before refining the answers across them, resulting in fewer calls to the language model and faster processing.

@kapa.ai why my model is generating two time i mean it refining the previous response
Attachment
Screenshot_2024-06-27_at_8.09.54_PM.png
k
b
5 comments
Please stand by @bidda7287 as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The behavior you're observing is due to the REFINE mode of LlamaIndex's response synthesizers. In REFINE mode, the system generates a response in an iterative manner. It first uses the context in the first node, along with the query, to generate an initial answer. This answer, the query, and the context of the second node are then passed as input into a “refine prompt” to generate a refined answer. This process is repeated through N-1 nodes, where N is the total number of nodes.

This is why you're seeing the model generate a response, and then refine that response. It's part of the design of the REFINE mode to iteratively improve the response.

You can find more details about this in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai i'm using response mode as compact
Let me check the knowledge base for you. One moment! :books:
----------------------
The COMPACT mode in LlamaIndex's response synthesizers is similar to the REFINE mode, but it first combines text chunks into larger consolidated chunks that more fully utilize the available context window, then refines answers across them. This mode is faster than REFINE since it makes fewer calls to the LLM.

In COMPACT mode, the system stuffs as many text (concatenated/packed from the retrieved chunks) that can fit within the context window. If the text is too long to fit in one prompt, it is split in as many parts as needed. Each text part is considered a "chunk" and is sent to the REFINE synthesizer. So, it is like REFINE, but with less LLM calls.

This is why you're seeing the model generate a response, and then refine that response. It's part of the design of the COMPACT mode to iteratively improve the response.

You can find more details about this in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord