Find answers from the community

Updated 6 months ago

why my model is generating two time i mean it refining the previous response

@kapa.ai why my model is generating two time i mean it refining the previous response
Attachment
Screenshot_2024-06-27_at_8.09.54_PM.png
k
b
5 comments
Please stand by @bidda7287 as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The behavior you're observing is due to the REFINE mode of LlamaIndex's response synthesizers. In REFINE mode, the system generates a response in an iterative manner. It first uses the context in the first node, along with the query, to generate an initial answer. This answer, the query, and the context of the second node are then passed as input into a “refine prompt” to generate a refined answer. This process is repeated through N-1 nodes, where N is the total number of nodes.

This is why you're seeing the model generate a response, and then refine that response. It's part of the design of the REFINE mode to iteratively improve the response.

You can find more details about this in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai i'm using response mode as compact
Let me check the knowledge base for you. One moment! :books:
----------------------
The COMPACT mode in LlamaIndex's response synthesizers is similar to the REFINE mode, but it first combines text chunks into larger consolidated chunks that more fully utilize the available context window, then refines answers across them. This mode is faster than REFINE since it makes fewer calls to the LLM.

In COMPACT mode, the system stuffs as many text (concatenated/packed from the retrieved chunks) that can fit within the context window. If the text is too long to fit in one prompt, it is split in as many parts as needed. Each text part is considered a "chunk" and is sent to the REFINE synthesizer. So, it is like REFINE, but with less LLM calls.

This is why you're seeing the model generate a response, and then refine that response. It's part of the design of the COMPACT mode to iteratively improve the response.

You can find more details about this in the LlamaIndex documentation ().
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord