I have a question about updating documents in a vector store. I'm trying to figure out how to update nodes for a document when the document gets updated.
For example, if my article is revised, I want to update it in the vector store without doing a lot of extra work. However, it seems that even when I use docstore_strategy=DocstoreStrategy.UPSERTS_AND_DELETE, the nodes in my document don't automatically get replaced.
I manually construct my Document object with the same id as the original, expecting it to replace the existing one. Is there something I'm missing in how updates are handled? Why aren't my document nodes being replaced as expected?
I saw a lot of the similar questions in the Github.
Are you attaching a docstore to the ingestion pipeline? Are you persisting that docstore anywhere?
Another key point is if the same document is loaded, it needs to have the same ID. Otherwise, there's no way to anchor a comparison to see if a) that document exists and b) has that document changed
I mean Is't it enough to just do dedupe based on the document id (document_id, doc_id) already included in the metadata_ field in the Vector store? Having separate DB just to keep cache of the doc_id seems redundant and overly complicated for most of the cases.