Are there any viable approches for this

At a glance

The community member is asking for viable approaches to update an ElasticSearchVectorstore index with only the PDF files that have changed or been removed, without having to re-index all the files again, in order to save on embedding costs. Another community member provided an example for a similar issue when using a vector database integration, mentioning that the process is more complicated and requires persisting/maintaining extra files. The example is for ChromaDB, but the community member notes it should work for any vector store.

Useful resources

bbenzen

Are there any viable approches for this issue: When loading all the documents for an index using a PDF loader, how can I only update the index with PDF files that have changed or have been removed using the ElasticSearchVectorstore without re-indexing all the files again, thus saving embedding costs?

2 comments

LLogan M

I wrote an example for this the other day

When using a vectordb integration, the process is a little more complicated. You have to persist/maintain extra files

https://discord.com/channels/1059199217496772688/1163880111074971790/1163900056718553169

LLogan M

(The example is for chromadb, but it should work for any vector store)

Add a reply

Find answers from the community

Are there any viable approches for this