Concretely, our method consists of two steps: a RAG-and-Route step and a long-context prediction step. In the first step, we provide the query and the retrieved chunks to the LLM, and prompt it to predict whether the query is answerable and, if so, generate the answer. This is similar to standard RAG, with one key difference: the LLM is given the option to decline answering with the prompt ‘‘Write unanswerable if the query can not be answered based on the provided text’’. For the queries deemed answerable, we accept the RAG prediction as the final answer. For the queries deemed unanswerable, we proceed to the second step, providing the full context to the long-context LLMs to obtain the final prediction (i.e., LC).
Interesting method but I propose 2 different ones, inspired by Lance:
- Use Parent-document retrieval where, using meta-data, you can do vector search on meta-data which would fetch all topic related chunks who would than retrieve it's parent document.
This means you would not need to ingest the complete document but you would e.g. ingest the relevant chapter(s) based on the query.
- There are multiple ways to implement this method but the bottleneck now seems to be proper way to create reliable meta-data for data.
From my early testing there is no real solution yet for accurate meta-data labeling that is affordable and usable.