Nodes

At a glance

Can anyone tell me how LlamaIndex sends the content of the Nodes to the LLM? Does it do so by filling the {context_str} variable in the retriever prompt template? I searched in the code and did not find it

4 comments

LLogan M

That's pretty much it. Retrieve nodes from the index, then format them to ensure you don't exceed the LLMs max input size

YYannZeRookie

Thanks! And I think I even found the code in /llama_index/response_synthesizers/refine.py 🙂

ttilleul

Depending on the response mode (refine, compact, etc), the retrieved chunks are concatenated (or not), creating a "new" set of chunks (or just the original one).

For example, if you retrieved 3 chunks, the "refine" synthesizer will call the LLM 3 times (using the qa_template and refine_template). But with "compact", it will try to fit the 3 chunks within the context window of the LLM.

If the actual chunks or the concatenated chunks don't fit the window (given a prompt), then they are splitted resulting in a new set of chunks. It is those chunks that are actually sent to the LLM, replacing the {context_str} placeholder.

YYannZeRookie

Thanks so much, this confirms my findings when tracing the code. I appreciate your answer!

Add a reply

Find answers from the community

Nodes