Its just an object/class that holds nodes and information about their parent documents
Essentially its a giant key-value store
I think many orgs currently use Apache Solr for their search platforms. If we could provide an integration with Solr just like we did for ES, that would be great! And Iβm currently working on this
Plus the notion behind document stores being purely key-value stores, do you think that might change? The code seems to suggest it might.
For example, what if instead of doing some splitting of documents into nodes and then storing them, we want to do a traditional keyword search on the query and let the llm summarize the retrieved results?
I'd actually like to see LLMs use normal file metadata to describe file contents and then also be able to use that to identify which files are relevant to it/us
Being able to use that to inform things like SimpleDirectoryReader or CSVReader so it knows what its looking at via extra_info seems prudent.
I think this is already possible by using the QueryPipeline though.
Or maybe you should only need to make small extensions to extract file metadata
I'm actually suspecting we're going to want to change file headers now.
Files are no longer something only accessible to ourselves, and having some more metadata like lyrics, tempo, etc for music files will all be meaningful metadata that would provide value to us by allowing our own AI models to better use our data.
Hey @Logan M
Sorry to tag you again. I was just wondering if you have any thoughts on this. Namely integrating Solr into LLaMaIndex just like we have for ES. And also whether the use-case of supporting keyword retrieval on an inverted index is still relevant in the RAG era (which current doc-stores do not support)
Its definitely possible to add this -- I don't think retrieval techniques belong on the docstore though, thats either a retriever or a vector store
Open to contributions in any case
But what if the docstore supports efficient retrieval of documents of interest?
As is the case with ES and Solr
Otherwise, I donβt see the point of distinguishing docstores and kvstores in the codebase. Most of the docstores inherit from KVDocumentStore instead of BaseDocumentStore anyway
yea a docstore is just a key-value lookup interface
I don't see why you can't define a retriever on top of the same collection a docstore uses.
imo docstores are not for retrieval, just key/val lookup and metadata tracking
Just trying to have clear responsibilities for classes π
I see. What is the base class for a retriever?
It has two methods to implement: _retrieve() and optionally _aretrieve()
Perfect! I'll have a look at those. Thank you Logan!