Currently i'm not using any vector db, i

At a glance

The community member is using the Llama default vector store to save their vector store as JSON, but they are unsure how to compare the existing default_vector_store.json information with their new set of documents they are trying to embed. They had hoped that VectorStoreIndex.from_documents would provide an API for this, but they couldn't find it.

The comments suggest that as long as the documents have a consistent ID, the community member can use index.refresh_ref_docs(documents) to update the index. They also discuss the desired behavior of updating the index with new documents and removing old ones when running the generate.py script of the Llama-create template.

The community members provide suggestions on how to handle the first and subsequent calls to update the index, such as using VectorStoreIndex.from_documents initially and then refresh_ref_docs for updates. They also discuss the possibility of writing a script to check for changes in the files and only sending the different files to LlamaParse.

The community member concludes that they can load the current index from the JSON file and compare it against the document store, and they need to override the

SSaltuk

Currently i'm not using any vector db, i use the storage context to just save the vector store as json (which is the Llama Default i think). I don't know how to compare the default_vector_store.json's information with my new set of Documents that i'm trying to embed. I had hoped that VectorStoreIndex.from_documents gives me an API for this, but unfortunately i don't see it.

7 comments

LLogan M

As long as your documents have a consistent id every time, you can use index.refresh_ref_docs(documents)

SSaltuk

So in the first call i would use from_documents and in subsequent calls i would only call refresh_ref_docs?

SSaltuk

I want this behavior when i execute the generate.py of the llama-create template. Everytime i run the generate.py i want the index to be updated with potentially new docs (and also remove old ones if possible)

SSaltuk

So on the first execution do VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True. How would i then in the second call guarantee that my *.json files are properly loaded and updated accordingly with removing existing and adding new documents?

ttitus

When you fetch all the files from your API, could you write a logic script to check the files that are same/different from your files you've parsed into your vector database, then send the different files to LlamaParse?

Or specifically you won't know because it could be just an edit in a document (like changing a sentence in one of the documents but not uploading new documents)?

SSaltuk

I mean that's what i'm trying to do but how would i do this?

SSaltuk

Ah okay, i think i can load the current index from my json file and then compare against the docstore. Currently my documents get assigned a custom uuidv4, which i think is the default. So i would have to ovveride this to make it consistent. Thanks @Logan M and @titus for the tips

Add a reply

Find answers from the community

Currently i'm not using any vector db, i