Find answers from the community

Updated 2 days ago

Updating an Existing Index in LlamaIndex

How to update (add or edit docs) an existing index? I am not able to reuse index.

Here is my code for saving data:
Plain Text
email_docs = process_emails_sync(filtered_unprocessed_emails, user)
docstore = MongoDocumentStore.from_uri(uri=LLAMAINDEX_MONGODB_STORAGE_SRV)
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(my_docs)
docstore.add_documents(nodes)
Settings.llm = OpenAI(model=ModelType.OPENAI_GPT_4_o_MINI.value)
Settings.embed_model = OpenAIEmbedding(api_key=OPENAI_API_KEY)
client = qdrant_client.QdrantClient(url=QDRANT_API_URL, api_key=QDRANT_API_TOKEN)

vector_store = QdrantVectorStore(client=client, collection_name=LLAMAINDEX_QDRANT_COLLECTION_NAME)

index_store = MongoIndexStore.from_uri(uri=LLAMAINDEX_MONGODB_STORAGE_SRV)
storage_context = StorageContext.from_defaults(vector_store=vector_store, index_store=index_store, docstore=docstore)

index = VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True)
index.storage_context.persist()

When I try to load the index using the same storage context as above I get an exception that I need to specify an index_id because a new index is created every time I run the code above. How to pass the index_id to the store so it updates existing index? Please note that I am already using doc_id correctly to ensure upserting of documents.

load_index_from_storage(storage_context=storage_context, index_id="8cebc4c8-9625-4a79-8544-4943b4182116")

I have tried using VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True, index_id="<index_id>") but that approach didn't work.
l
L
33 comments
I do see documents in my MongoDB docstore which I want but nothing in the doc_store portion of index store. My index looks like {"_id":"602a8035-4b00-45d6-8b57-3c9646e4c07e","__data__":"{\"index_id\": \"602a8035-4b00-45d6-8b57-3c9646e4c07e\", \"summary\": null, \"nodes_dict\": {}, \"doc_id_dict\": {}, \"embeddings_dict\": {}}","__type__":"vector_store"}

Will VectorStoreIndex.from_vector_store(vector_store) disable saving of documents in the MongoDB?
hmm interesting that its showing up in mongo, because you never set store_nodes_override=True πŸ€”
maybe try setting that
in the constructor of the index
Will try that. Is there a way for us to store the documents/nodes somewhere with VectorStoreIndex.from_vector_store(vector_store) We prefer not to write our own doc storage wrapper
no, in that case, I would continue using storage context and load_index_from_storage

You might might to configure the mongodb index store as well just to keep everything off-disk?

So the full code might look like

Plain Text
storage_context = StorageContext.from_defaults(
  vector_store=vector_store,
  docstore=docstore,
  index_store=index_store
)

# create
index = VectorStoreIndex(nodes, storage_context=storage_context, ...)
index.set_index_id("some index id")

# persist
# nothing to do actually, since it should persist automatically

# load
index = load_index_from_storage(storage_context, index_id="some index id")
But my debugging shows that the MongoDB index is created as soon as the constructor run VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True) Will index.set_index_id("some index id") change the key in MongoDB?
@Logan M You are right index.set_index_id("some index id") changed the key in MongoDB and saved the index with the new key?
yea thats what it does
that way you get a known index id instead of a random UUID
do i need to call index.storage_context.persist() ?
thats only for to-disk
but in the above, we aren't saving anything to disk
@Logan M You are right. I do not see docstore data in mongodb.
Even VectorStoreIndex(nodes, storage_context=storage_context, show_progress=True, store_nodes_override=True) doesn't store docs in MongoDB
πŸ€” store nodes override should definitely enable it
its calling add documents on the nodes if thats True
is it just a delay if there's a lot of data?
yup it was some delay or something. I do see them now.
Thanks @Logan M I think I am all set. One last question, is it expected to have all dict empty in the index store?

Plain Text
{"_id":"ae60ab16-88b8-41ea-9fd9-e64968a68e5f","__data__":"{\"index_id\": \"ae60ab16-88b8-41ea-9fd9-e64968a68e5f\", \"summary\": null, \"nodes_dict\": {}, \"doc_id_dict\": {}, \"embeddings_dict\": {}}","__type__":"vector_store"}
I think its probably fine? πŸ€·β€β™‚οΈ
Seems like it but I am curious why these dicts exist if they are empty in my case?
old legacy stuff
Haven't dug into that code in a hot minute
then I am wondering what is the point of index store at all? All the stuff is in vector and doc store right?
why would someone save index store?
Its supposed to be keeping track of the node ids belonging to that index. Whether or not thats working at the moment, I'm not sure, but I'm pretty sure it works without mongodb
So, not sure tbh
Don't have time to debug at the moment. Might take a deeper look later
JK, decided to test quickly.

hmmm testing locally (without mongodb though, setting up atlas is such a pain), looking at the index store stored to disk, nodes_dict is populated

Plain Text
>>> from llama_index.core import StorageContext, VectorStoreIndex, Document
>>> from llama_index.vector_stores.qdrant import QdrantVectorStore
>>> import qdrant_client
>>> client = qdrant_client.QdrantClient(path="./qdrant_db_test")
>>> vector_store = QdrantVectorStore(collection_name="test", client=client)
>>> storage_context = StorageContext.from_defaults(vector_store=vector_store)
>>> index = VectorStoreIndex.from_documents([Document.example()], storage_context=storage_context, store_nodes_override=True)
>>> index.set_index_id("test_index")
>>> index.storage_context.persist(persist_dir="./qdrant_test_storage")
Plain Text
{"index_store/data": {"test_index": {"__type__": "vector_store", "__data__": "{\"index_id\": \"test_index\", \"summary\": null, \"nodes_dict\": {\"2127aa66-2e6b-4337-b910-8af1e2cb5328\": \"2127aa66-2e6b-4337-b910-8af1e2cb5328\"}, \"doc_id_dict\": {}, \"embeddings_dict\": {}}"}}}
Add a reply
Sign up and join the conversation on Discord