Hi team,
Thanks for all the wonderful work you guys have been doing.
I was wondering if someone could help me with one of the queries I had regarding the Ingestion Pipeline and Document Management using Llama Index.
I have explored that docstore is able to remove the duplicate documents when ingested using the Ingestion Pipeline with a Vector Store configured and have experimented around the same as well.
Though does it apply to the Vector Store as well? Meaning that embeddings and other metadata stored for a duplicate documents is removed automatically.
For me it's not happening if this is possible, cause if I ingest 2 documents using the Ingestion Pipeline, the docstore will have 2 documents and if I re-ingest the same documents, the docstore will have 2 documents only but the vector store is working in append mode only and the number of documents (based on nodes) in index store keeps on increasing.
Any help/guidance is much appreciated.
Reference link -
https://docs.llamaindex.ai/en/stable/examples/ingestion/document_management_pipeline/