Find answers from the community

Updated 10 months ago

Docstore

@Logan M How can I iterate through all Nodes in a VectorStoreIndex? I used to find VectorStoreIndex.index_struct.nodes_dict work for this purpose earlier, or VectorStoreIndex.docstore.docs. I find this stopped working after I customized storage_context to use a vectorstore (LanceDB), using a storage context like this:
Plain Text
StorageContext(docstore=<llama_index.core.storage.docstore.simple_docstore.SimpleDocumentStore object at 0x7fe31e89bd30>,
               index_store=<llama_index.core.storage.index_store.simple_index_store.SimpleIndexStore object at 0x7fe31e8994e0>,
               vector_stores={'default': <llama_index.vector_stores.lancedb.base.LanceDBVectorStore object at 0x7fe31e89b3a0>,
                              'image': <llama_index.core.vector_stores.simple.SimpleVectorStore object at 0x7fe31e89ba30>},
               graph_store=<llama_index.core.graph_stores.simple.SimpleGraphStore object at 0x7fe31e89b9a0>)

All I did was then
Plain Text
index = VectorStoreIndex(nodes, storage_context=storage_context)
index.storage_context.persist()

the ./storage/docstore.json would be empty. Is this expected? while if I don't customize storage_context, I find ./storage/docstore.json contains the TextNode I used to create VectorStoreIndex.

I was wishing if I switch to use LanceDB or any other vector store, llama index's behavior with respect to docstore etc would remain unchanged. Am I missing something that's obvious? Thank you for any assistance.
L
e
4 comments
Using a third party vector db integration disables the docstore/indexstore, and stores everything in the vector db.

This simplifies storage, at the cost of losing some convince in terms of accessing storage.

You can override this though by using store_nodes_override=True in the constructor for the VectorStoreIndex

But this means you need to persist the storage context text to/from disk still

Plain Text
index.storage_context.persist(persist_dir="./storage")

storage_context = StorageContext.from_defaults(persist_dir="./storage", vector_store=lancedb_vector_store)
Thanks for the pointer. So what is the benefit of using a 3rd party vector db, if the cost is to lose certain access to the data? I would then keep using SimpleVectorStore for now, since we are still experimenting and have not thought about anything production yet.
Ah never mind, kapa gave satisfactory answers for my follow up question.
3rd party vector db is highly optimized, scaleable, and hostable.

You can still use the docstore on its own, or using the code above as well, if needed
Add a reply
Sign up and join the conversation on Discord