Hello all Thanks to GPT Index I ve

At a glance

Hello all! Thanks to GPT_Index I've managed to put together a script that queries my extensive personal note collection which is a local directory of about 20k markdown files. Some of which are very long. I work in this folder all day everyday, so there are frequent changes. Currently I would need to rerun the entire indexing (is that the correct term?) when I want to incorporate edits I've made.

So my question is... is there a way to schedule indexing to maybe once per day and only add information for files that have changed? Or even just manually run it but still only add edits? This would make a huge difference in saving time (I have to leave it running overnight for the entire directory) as well as cost 😬.

Excuse me if this is a dumb question, I'm not a programmer and am sort of muddling around figuring this out 🤓

Thank you for making this sort of project accessible to someone like me!

7 comments

jjerryjliu0

This is an interesting use case, thanks for raising.
Currently GPT Index supports building the index from scratch, and manual insertion of documents - it doesn't support deletions or updates at the moment (definitely a TODO we can work on).

Another option you can try right now is to store your documents and embeddings in Weaviate (or store embeddings in Pinecone/Faiss), and then manually update embeddings in those document stores when embeddings change. You can then use our data loader to 1) connect to this data store, and 2) optionally specify an embedding to fetch the top-k most similar documents, and 3) feed these documents into GPT Index.

Of course there will be a bit of engineering work with the second approach. Will keep you posted when we do support updates/deletions!
https://github.com/jerryjliu/gpt_index/blob/main/examples/data_connectors/WeaviateDemo.ipynb

hhwchase17

@jerryjliu0 if you add support for vectorstores in langchain you should get adding for free (has add_texts method)! https://github.com/hwchase17/langchain/blob/c5f0af93988f97fb5b05f0d6e8c811005a654356/langchain/vectorstores/base.py#L15

jjerryjliu0

@hwchase17 we'll add the integration, but does this support update/delete too? there's currently a way to add docs to a vector store index within gpt index

hhwchase17

ah no - sorry i misread "manual insertion" as having to insert into the underlying store, rather than having a standard interface

jjerryjliu0

ah no worries! we'll still try to add the integration this week though

hhwchase17

woo! i may have some time as well, can try to take a stab

ooskar

@arminta7 just a kudos to you -- I AM (or maybe was) a programmer and I am still gun-shy trying to implement GPT_Index... Keep on going and you may have to stop telling people you're 'not a programmer" very soon... Would love to read any writeup you come up with to document your process!

Add a reply

Find answers from the community

Hello all Thanks to GPT Index I ve