Overwrite

At a glance

The community members are discussing how to overwrite existing nodes with new content in a Postgres vector store. A community member suggests using the "refresh" feature, which requires a docstore and consistent document IDs. Another community member provides an example using Chroma, which would be similar for Postgres.

The discussion then covers the need for index storage in addition to the vector store, as the docstore is required to support the refresh functionality. The community members also discuss the behavior of the vector store, where it appears to be appending rather than overwriting, and whether there is a way to overwrite rows.

A community member suggests that the example should be appending, unless the document is already inserted, in which case it would be an upsert. This relies on the consistency of the document IDs.

Finally, a community member requests the ability to update nodes by ID, as they have a complicated ingestion pipeline where some nodes fail, and they need to update the metadata of those failing nodes without reloading the entire document.

Useful resources

JJayC

Hi All, I am using postgres as my vectorstore and was wondering if there is a way to overwrite an existing nodes with new content.

12 comments

LLogan M

Not really easily 😅 you'd have to use the refresh feature, which depends on having the docstore (activated with an override) and having consistent doc_ids on input documents.

Then, when a documents content has been updated, it's nodes will be overwritten

LLogan M

Here's an example with chroma. Postgres would be very similar

https://discord.com/channels/1059199217496772688/1163880111074971790/1163900056718553169

JJayC

Thanks Logan. Follow up question. Why do we need index storage if we use Vector store to store indices?

LLogan M

Because if you want to support refresh functionality, there needs to be a layer on top of the vector db tracking what's been inserted (I.e. the docstore)

JJayC

Gotcha.

JJayC

The example you shared does overwrite the index and document storage, but in vectordb, it appears to be appending rather than overwriting. Is this a default pattern. If so, is there currently a way to overwrite the rows in vectordb?

LLogan M

In the example it should be appending, unless the document is already inserted, then it's an upsert 🤔 But it relies on the doc_id of the inputs being consistent (i.e. the same source document should have the same doc_id, even if it's content has changed)

JJayC

ahh okay. There was an error on my end. it does indeed upsert. Thanks a lot for the help!

JJoshhhh

Feature Request: Ability to update node by ID

LLogan M

But why 👀 documents are the main thing loaded/ingested

A small hack would be setting the parent of the node to the same as the node ID

JJoshhhh

Because I have a complicated IngestionPipeline that occasionally fails (but continues so as to not fail very large batches), and now I need to update those failing nodes' metadata to match the pattern in the rest of the data

JJoshhhh

(and only a subset of a document's nodes fail, so I don't want to reload the entire document)

Add a reply

Find answers from the community

Overwrite