Find answers from the community

Updated last year

hi! I'm looking for a way to index a

At a glance
hi! I'm looking for a way to index a collection of PDFs in a way that would enable me later to perform two-hop querying:
hop 1: I would like to select relevant PDFs based on document description generated based on the entire PDFs contents
hop 2: from selected PDFs, I'd like to select pages based on their descriptions

so both hops will have their lookup implemented using an LLM. if I was doing just hop 2, I think I could just use DocumentSummaryIndex. how hard would it be to set up this two hop process?
L
g
7 comments
I think this is exactly a recursive retriever

Make an index per PDF

Fetch each index based on it's description, then retrieve from there
Ah, thanks! Looking at the guide, it's surprisingly low-level and elaborate, though.
A little bit, but there's not really a good way to automate this -- needs a human touch
Maybe, however, I get a feeling that LlamaIndex has a DX gap to close, especially when I compare it to something like Astro.js. I know Astro is in a completely different and more mature space, but I would definitely look at Astro for an inspiration when it comes to both abstractions and general DX.
if you have any actionable suggestions, would love to know πŸ™‚ In general the library is extremely young -- always trying to improve, reduce tech debt, and make it easier for others to contribute
Yes, I keep in mind that LlamaIndex is pre-1.0. Two things come to my mind:
  1. More high-level docs explaining the design of LlamaIndex so one can build a mental model of how LlamaIndex approaches RAG and what to expect from it. Things I'd cover is: how information is flowing, how prompting works, how you approach customizations of prompting, what trade offs and priorities you chose. See: https://docs.astro.build/en/concepts/why-astro/
  2. Better tools for introspection. For example, it would be amazing if you every abstraction like Node, Document, Index, etc. had a method introspection_guide() which would print in a repl/notebook all the ways one can poke at an object and see what's inside. For now, I go to source code and try to work out what properties to print to verify that e.g. DocumentSummaryIndex did what I think it did.
Add a reply
Sign up and join the conversation on Discord