Can you retrieve original documents from a VectorStoreIndex
Can you retrieve original documents from a VectorStoreIndex
At a glance
The community member has created a VectorStoreIndex (specifically a QdrantStoreIndex) and wants to know if they can retrieve the original documents from it. They understand that they cannot retrieve the original documents directly from the VectorStoreIndex and would need to create a separate DocumentStore to build a SummaryIndex.
In the comments, another community member suggests that the original documents can be retrieved by "stitching" the related nodes together, as they have relationships if they were formed from a Document object. They also mention a method in the Qdrant vector store class to fetch up to 9999 nodes at once.
However, the original poster responds that stitching the nodes together does not seem like a practical approach, as the nodes have overlapping text which could be a source of bugs.
There is no explicitly marked answer in the comments.
Hi gang. I'm posting this question for clarification
I have created a VectorStoreIndex (specifically a QdrantStoreIndex) on a bunch of documents. Can I retrieve the original documents from the VectorStoreIndex? If I can retrieve the original documents, then I'd like to build a SummaryIndex on top of them.
My understanding is that I cannot retrieve the original documents from the VectorStoreIndex. And I would have to create a separate DocumentStore and utilize it to build the SummaryIndex on. Please clarify.
Thanks @WhiteFang_Jr . Stitching the nodes does not feel like a practical approach since the nodes have overlapping text and that might become source of bugs.