Find answers from the community

Updated 2 years ago

Hello all Thanks to GPT Index I ve

At a glance
Hello all! Thanks to GPT_Index I've managed to put together a script that queries my extensive personal note collection which is a local directory of about 20k markdown files. Some of which are very long. I work in this folder all day everyday, so there are frequent changes. Currently I would need to rerun the entire indexing (is that the correct term?) when I want to incorporate edits I've made.

So my question is... is there a way to schedule indexing to maybe once per day and only add information for files that have changed? Or even just manually run it but still only add edits? This would make a huge difference in saving time (I have to leave it running overnight for the entire directory) as well as cost 😬.

Excuse me if this is a dumb question, I'm not a programmer and am sort of muddling around figuring this out πŸ€“

Thank you for making this sort of project accessible to someone like me!
j
h
o
7 comments
This is an interesting use case, thanks for raising.
Currently GPT Index supports building the index from scratch, and manual insertion of documents - it doesn't support deletions or updates at the moment (definitely a TODO we can work on).

Another option you can try right now is to store your documents and embeddings in Weaviate (or store embeddings in Pinecone/Faiss), and then manually update embeddings in those document stores when embeddings change. You can then use our data loader to 1) connect to this data store, and 2) optionally specify an embedding to fetch the top-k most similar documents, and 3) feed these documents into GPT Index.

Of course there will be a bit of engineering work with the second approach. Will keep you posted when we do support updates/deletions!
https://github.com/jerryjliu/gpt_index/blob/main/examples/data_connectors/WeaviateDemo.ipynb
@jerryjliu0 if you add support for vectorstores in langchain you should get adding for free (has add_texts method)! https://github.com/hwchase17/langchain/blob/c5f0af93988f97fb5b05f0d6e8c811005a654356/langchain/vectorstores/base.py#L15
@hwchase17 we'll add the integration, but does this support update/delete too? there's currently a way to add docs to a vector store index within gpt index
ah no - sorry i misread "manual insertion" as having to insert into the underlying store, rather than having a standard interface
ah no worries! we'll still try to add the integration this week though
woo! i may have some time as well, can try to take a stab
@arminta7 just a kudos to you -- I AM (or maybe was) a programmer and I am still gun-shy trying to implement GPT_Index... Keep on going and you may have to stop telling people you're 'not a programmer" very soon... Would love to read any writeup you come up with to document your process!
Add a reply
Sign up and join the conversation on Discord