The community members are discussing how to make local indexing idempotent, so that when a new file is added to a folder and ingestion is run, only the new file is updated rather than all files. The suggestions include using an ingestion pipeline, docstore, and vector store, as well as tracking what documents have been inserted to avoid re-ingesting everything. There is also discussion around getting timestamps from Whisper and adding them to the metadata, as well as using the filename as a unique identifier. One community member is trying to create a quick RAG chat for their friends to interact with BJJ DVDs and have the language model point them to the relevant timestamp where their question is answered.
seems like yes with whisper, is there a good way to make local indexing idempotent, I.E. if I add a new file to the folder and run ingestion, it only updates that one file rather than all
I am trying to throw together a super quick RAG chat for my friends so they can talk to BJJ dvds as the dvds get verbose and annoying, then have the llm point them to the timestamp where the question is answered