Not really easily π you'd have to use the refresh feature, which depends on having the docstore (activated with an override) and having consistent doc_ids on input documents.
Then, when a documents content has been updated, it's nodes will be overwritten
Because if you want to support refresh functionality, there needs to be a layer on top of the vector db tracking what's been inserted (I.e. the docstore)
The example you shared does overwrite the index and document storage, but in vectordb, it appears to be appending rather than overwriting. Is this a default pattern. If so, is there currently a way to overwrite the rows in vectordb?
In the example it should be appending, unless the document is already inserted, then it's an upsert π€ But it relies on the doc_id of the inputs being consistent (i.e. the same source document should have the same doc_id, even if it's content has changed)
Because I have a complicated IngestionPipeline that occasionally fails (but continues so as to not fail very large batches), and now I need to update those failing nodes' metadata to match the pattern in the rest of the data