The community members are discussing how to overwrite existing nodes with new content in a Postgres vector store. A community member suggests using the "refresh" feature, which requires a docstore and consistent document IDs. Another community member provides an example using Chroma, which would be similar for Postgres.
The discussion then covers the need for index storage in addition to the vector store, as the docstore is required to support the refresh functionality. The community members also discuss the behavior of the vector store, where it appears to be appending rather than overwriting, and whether there is a way to overwrite rows.
A community member suggests that the example should be appending, unless the document is already inserted, in which case it would be an upsert. This relies on the consistency of the document IDs.
Finally, a community member requests the ability to update nodes by ID, as they have a complicated ingestion pipeline where some nodes fail, and they need to update the metadata of those failing nodes without reloading the entire document.
Not really easily π you'd have to use the refresh feature, which depends on having the docstore (activated with an override) and having consistent doc_ids on input documents.
Then, when a documents content has been updated, it's nodes will be overwritten
Because if you want to support refresh functionality, there needs to be a layer on top of the vector db tracking what's been inserted (I.e. the docstore)
The example you shared does overwrite the index and document storage, but in vectordb, it appears to be appending rather than overwriting. Is this a default pattern. If so, is there currently a way to overwrite the rows in vectordb?
In the example it should be appending, unless the document is already inserted, then it's an upsert π€ But it relies on the doc_id of the inputs being consistent (i.e. the same source document should have the same doc_id, even if it's content has changed)
Because I have a complicated IngestionPipeline that occasionally fails (but continues so as to not fail very large batches), and now I need to update those failing nodes' metadata to match the pattern in the rest of the data