Find answers from the community

Updated 2 months ago

Currently i'm not using any vector db, i

Currently i'm not using any vector db, i use the storage context to just save the vector store as json (which is the Llama Default i think). I don't know how to compare the default_vector_store.json's information with my new set of Documents that i'm trying to embed. I had hoped that VectorStoreIndex.from_documents gives me an API for this, but unfortunately i don't see it.
L
S
t
7 comments
As long as your documents have a consistent id every time, you can use index.refresh_ref_docs(documents)
So in the first call i would use from_documents and in subsequent calls i would only call refresh_ref_docs?
I want this behavior when i execute the generate.py of the llama-create template. Everytime i run the generate.py i want the index to be updated with potentially new docs (and also remove old ones if possible)
So on the first execution do VectorStoreIndex.from_documents(documents, storage_context=storage_context, show_progress=True. How would i then in the second call guarantee that my *.json files are properly loaded and updated accordingly with removing existing and adding new documents?
When you fetch all the files from your API, could you write a logic script to check the files that are same/different from your files you've parsed into your vector database, then send the different files to LlamaParse?

Or specifically you won't know because it could be just an edit in a document (like changing a sentence in one of the documents but not uploading new documents)?
I mean that's what i'm trying to do but how would i do this?
Ah okay, i think i can load the current index from my json file and then compare against the docstore. Currently my documents get assigned a custom uuidv4, which i think is the default. So i would have to ovveride this to make it consistent. Thanks @Logan M and @titus for the tips
Add a reply
Sign up and join the conversation on Discord