Can anyone tell me how LlamaIndex sends the content of the Nodes to the LLM? Does it do so by filling the {context_str} variable in the retriever prompt template? I searched in the code and did not find it
Depending on the response mode (refine, compact, etc), the retrieved chunks are concatenated (or not), creating a "new" set of chunks (or just the original one).
For example, if you retrieved 3 chunks, the "refine" synthesizer will call the LLM 3 times (using the qa_template and refine_template). But with "compact", it will try to fit the 3 chunks within the context window of the LLM.
If the actual chunks or the concatenated chunks don't fit the window (given a prompt), then they are splitted resulting in a new set of chunks. It is those chunks that are actually sent to the LLM, replacing the {context_str} placeholder.