Find answers from the community

Updated 5 months ago

Update Index

At a glance

The community members are discussing how to update a locally stored index without reconstructing the entire index. A community member suggests using the insert() method to insert new documents and the update_ref_doc() method to update existing documents. They note that for updating, the document must have a doc_id specified. Another community member asks about updating documents where the additional text is not at the end of the document, and whether the insert() method automatically filters out existing documents. The community members suggest using the refresh() method for adding documents that contain both existing and new content, and that the doc_id is required for all these operations.

Useful resources
Is there a way to update a locally stored index rather than reconstruct it each time? I'm thinking about the scenario where you want to update it whenever a document changes, but perhaps not reconstruct the entire index over the entire document. Not sure if that's possible though.
W
g
6 comments
It is possibe, You just need to use the insert() method on the index.

Plain Text
# inserting new documents
docs = SimpleDirectoryReader().load_data()

for doc in docs:
  index.insert(doc)

# For updating existing record
# NOTE: the document has a `doc_id` specified
doc = Document(text="Brand new document text", doc_id="SAME AS THE DOC YOU WANT TO UPDATE")
index.update_ref_doc(
    doc,
    update_kwargs={"delete_kwargs": {"delete_from_docstore": True}},
)


For more refer here:
https://docs.llamaindex.ai/en/stable/module_guides/indexing/document_management.html
So I would need to parse the new text myself, find the doc ID I want to update, and then call

Plain Text
index.update_ref_doc(
    doc,
    update_kwargs={"delete_kwargs": {"delete_from_docstore": True}},
)


What if the additional text is not at the tail end of the document? If this is covered in the URL, I apologize; I'll read through it now
And for inserting new docs, does it automatically filter out existing docs or do I need to specify new docs?
No no, If you are adding docs which contain existing as well as new I would suggest you use refresh()



But For all of this to take place, you would need docIDs
as these are all chunk of information, so to exactly update a particular chunk you would need the ID of that chunk
ah I see; thanks for the pointers!
Add a reply
Sign up and join the conversation on Discord