Find answers from the community

Updated 10 months ago

seems like yes with whisper, is there a

At a glance

The community members are discussing how to make local indexing idempotent, so that when a new file is added to a folder and ingestion is run, only the new file is updated rather than all files. The suggestions include using an ingestion pipeline, docstore, and vector store, as well as tracking what documents have been inserted to avoid re-ingesting everything. There is also discussion around getting timestamps from Whisper and adding them to the metadata, as well as using the filename as a unique identifier. One community member is trying to create a quick RAG chat for their friends to interact with BJJ DVDs and have the language model point them to the relevant timestamp where their question is answered.

Useful resources
seems like yes with whisper, is there a good way to make local indexing idempotent, I.E. if I add a new file to the folder and run ingestion, it only updates that one file rather than all
L
m
11 comments
Need to use an ingestion pipeline + docstore + vector store for that
can i just use a local vector store in storage and still let it work?

In addition, is there an easy way to get timestamps from whisper and add to metadata?
Not easily. You need some layer to track what documents have been inserted.
Also not sure on the whisper thing
id probably have to implement my own doc loader and add the timestamp stuff manually to metadata
I know deepgram supports timestamps and notarization but idk about whisper
since documents get broken into nodes, you need some top-level tracking. With a consistent document ID, then you can compare hashes of content.

Hope that makes sense (this is what the example above is doing)
ahhh that makes sense, i guess, technically, i can use the filename as a "unique" identifier in someway
I am trying to throw together a super quick RAG chat for my friends so they can talk to BJJ dvds as the dvds get verbose and annoying, then have the llm point them to the timestamp where the question is answered
Add a reply
Sign up and join the conversation on Discord