also i found out that managing document updates is so hard in production, i have a client that very often they have new documents, and as for now it has been manual but later i want to make it in automatic fasion, still not sure what is the best way and best vector database to use to handle upadting old documents with new ones
For new docs, you can always insert in the existing index. For the large amount of docs, I would suggest using vector store ( Qdrant works like a charm for me )
Also if you have updates in the same documents, I prefer removing the previous collection and point my vector index towards the new collection if there are lot of documents ( unnecessary conditional checking)
I run a cron job which in the background creates the new collections and once its created i simply point index towards it and del the previous one.
for updating existing you can tag the nodes with hash or filename and then when a document is updated you delete all nodes by hash or filename and insert he new nodes