Find answers from the community

Updated last year

Nodes

Can anyone tell me how LlamaIndex sends the content of the Nodes to the LLM? Does it do so by filling the {context_str} variable in the retriever prompt template? I searched in the code and did not find it
L
Y
t
4 comments
That's pretty much it. Retrieve nodes from the index, then format them to ensure you don't exceed the LLMs max input size
Thanks! And I think I even found the code in /llama_index/response_synthesizers/refine.py πŸ™‚
Depending on the response mode (refine, compact, etc), the retrieved chunks are concatenated (or not), creating a "new" set of chunks (or just the original one).

For example, if you retrieved 3 chunks, the "refine" synthesizer will call the LLM 3 times (using the qa_template and refine_template). But with "compact", it will try to fit the 3 chunks within the context window of the LLM.

If the actual chunks or the concatenated chunks don't fit the window (given a prompt), then they are splitted resulting in a new set of chunks. It is those chunks that are actually sent to the LLM, replacing the {context_str} placeholder.
Thanks so much, this confirms my findings when tracing the code. I appreciate your answer!
Add a reply
Sign up and join the conversation on Discord