Did anyone try to use recursive retriever with embedded tables on several hundreds of documents? If someone has a large number of complex documents, each containing different embedded tables, I don't think we can use the recursive retriever effectively. Am I right or it's just me who didn't understand something? We could create a complex index structure for each of the docs separately, but it wouldn't be efficient to use an llm, to decide from hundreds of possibilities which one to use. Some kind of embedding based routing would be a great idea in my opinion. Currently working on it, but let me know if there is a better way.
It does, but what I experienced is, that if a table doesn't contain explicitly informations specific to the context it won't be retrieved. So if I have a document about product A, which contains a table, but with mostly general information like, size, weight, etc. the recursive retriever won't know, which document this table was in. Let's say I want to compare two products, which both contain the same basic table, it won't know that table 1 belongs to product A, and table 2 belongs to product B.
I guess the idea would be to have an IndexNode for product A and product B, and those index nodes point to retrievers for data specific to that product, including tables