Updating an Existing Index in LlamaIndex

At a glance

The community member is trying to update an existing index by adding or editing documents. They are using a combination of MongoDB, Qdrant, and LlamaIndex to manage the data. The main issues they are facing are:

1. When trying to load the index, they get an exception that they need to specify an index_id, as a new index is created every time they run the code.

2. They have tried using VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True, index_id="<index_id>") but it didn't work.

The community members discuss various approaches, such as setting store_nodes_override=True in the VectorStoreIndex constructor, and using index.set_index_id("some index id") to change the key in MongoDB. They also discuss the purpose of the index store and why some of the dictionaries might be empty.

There is no explicitly marked answer, but the community members seem to have found a solution that involves using load_index_from_storage(storage_context, index_id="some index id") to load the index with a known index_i

llogiclord

How to update (add or edit docs) an existing index? I am not able to reuse index.

Here is my code for saving data:

Plain Text

email_docs = process_emails_sync(filtered_unprocessed_emails, user)
docstore = MongoDocumentStore.from_uri(uri=LLAMAINDEX_MONGODB_STORAGE_SRV)
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(my_docs)
docstore.add_documents(nodes)
Settings.llm = OpenAI(model=ModelType.OPENAI_GPT_4_o_MINI.value)
Settings.embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY)
client = qdrant_client.QdrantClient(url=QDRANT_API_URL, api_key=QDRANT_API_TOKEN)

vector_store = QdrantVectorStore(client=client, collection_name=LLAMAINDEX_QDRANT_COLLECTION_NAME)

index_store = MongoIndexStore.from_uri(uri=LLAMAINDEX_MONGODB_STORAGE_SRV)
storage_context = StorageContext.from_defaults(vector_store=vector_store, index_store=index_store, docstore=docstore)

index = VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True)
index.storage_context.persist()

When I try to load the index using the same storage context as above I get an exception that I need to specify an index_id because a new index is created every time I run the code above. How to pass the index_id to the store so it updates existing index? Please note that I am already using doc_id correctly to ensure upserting of documents.

load_index_from_storage(storage_context=storage_context, index_id="8cebc4c8-9625-4a79-8544-4943b4182116")

I have tried using VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True, index_id="<index_id>") but that approach didn't work.

33 comments

llogiclord

I do see documents in my MongoDB docstore which I want but nothing in the doc_store portion of index store. My index looks like

{"_id":"602a8035-4b00-45d6-8b57-3c9646e4c07e","__data__":"{\"index_id\": \"602a8035-4b00-45d6-8b57-3c9646e4c07e\", \"summary\": null, \"nodes_dict\": {}, \"doc_id_dict\": {}, \"embeddings_dict\": {}}","__type__":"vector_store"}

Will VectorStoreIndex.from_vector_store(vector_store) disable saving of documents in the MongoDB?

Find answers from the community

Updating an Existing Index in LlamaIndex