The community member is asking how to version their documents (Notion, PDF, etc.) for a RAG pipeline, as updating the documentation would require re-vectorizing the entire data. The comments suggest two potential solutions:
1. Setting filename_as_id=True when reading the documents, which will use the filename as a unique document ID. When a file is updated, the community member can remove all the nodes for that file ID and insert the updated file.
2. An automated way using the LlamaIndex library, which compares hashes and nodes, and only updates the documents where there is a match.
There is no explicitly marked answer, but the community members have provided suggestions to address the versioning issue.
how can i version my documents (notion / pdf etc. ) for RAG pipeline. lets say if there is any update in the documentation then i will have to vectorize complete data again
I think you can add filename_as_id=True while reading the docs.
This will add the unique doc ID as your filename. When any existing file gets updated you can remove all the nodes for that file ID and insert the updated file.