Find answers from the community

Updated 2 years ago

Nodes

At a glance
Can anyone tell me how LlamaIndex sends the content of the Nodes to the LLM? Does it do so by filling the {context_str} variable in the retriever prompt template? I searched in the code and did not find it
L
Y
t
4 comments
That's pretty much it. Retrieve nodes from the index, then format them to ensure you don't exceed the LLMs max input size
Thanks! And I think I even found the code in /llama_index/response_synthesizers/refine.py πŸ™‚
Depending on the response mode (refine, compact, etc), the retrieved chunks are concatenated (or not), creating a "new" set of chunks (or just the original one).

For example, if you retrieved 3 chunks, the "refine" synthesizer will call the LLM 3 times (using the qa_template and refine_template). But with "compact", it will try to fit the 3 chunks within the context window of the LLM.

If the actual chunks or the concatenated chunks don't fit the window (given a prompt), then they are splitted resulting in a new set of chunks. It is those chunks that are actually sent to the LLM, replacing the {context_str} placeholder.
Thanks so much, this confirms my findings when tracing the code. I appreciate your answer!
Add a reply
Sign up and join the conversation on Discord