appreciated 💪 I think this topic already came to mind before in a talk by Lance from LangChain, summary:
(
https://www.youtube.com/watch?v=UlmyyYQGhzc)
- Context lengths for LLMs are increasing, raising questions about the necessity of external retrieval systems like RAG, especially when massive amounts of context can be fed directly into LLMs.
- Greg Kamradt's Needle in A Haystack analysis tested LLMs' ability to retrieve specific facts from varying context lengths and placements within documents, revealing limitations in retrieval, particularly towards the start of longer documents.
- RAG systems aim for multi-fact retrieval, requiring the retrieval of multiple facts from a context. Google's recent 100-needle retrieval demonstrates the need for efficient multi-needle retrieval for comprehensive understanding.
- Retrieval from long contexts doesn't guarantee retrieval of multiple facts, especially with increasing context size and number of needles.
- Cost of long-context tests can be managed effectively, with careful budgeting enabling meaningful research without significant financial strain.
Limitations for Longer Context:
- no retrieval guaranties, multiple facts are not guaranteed to be retrieved, especially as the number of needles and context size increases.
GPT4-0 tends to fail near the start of the document-size, less fails on bigger datasets. - Specific prompting is needed for larger contexts.
- Performance degrades when the LLM is asked to reason about retrieved facts the longer the context.
- Longer Context are pricey, and take longer to generate.
My takes: in the future there is less focus on indexing/chunking and more focus on improving retrieval while reducin hallucinations. DSPY could be interesting for this.