`load_index_from_storage` vs

At a glance

The community members are discussing the differences between load_index_from_storage and VectorStoreIndex.from_vector_store(). They note that from_vector_store() is meant for when using a vector database integration, which by default stores all data in the vector database and does not have a docstore. To get the docstore, the community members suggest manually adding to it and persisting it, or overriding the disabling by using VectorStoreIndex(..., store_nodes_override=True).

When storing multiple indices, the community members explain that each index is its own collection/namespace in the vector database, and you can use from_vector_store() to retrieve them. If you need the index.docstore.docs, you can set the override and persist/load the storage context.

The community members also discuss a specific use case where the user is creating two different indices (VectorStoreIndex and SummaryIndex) and storing them in a Qdrant vector store. They suggest using load_indices_from_storage to retrieve the stored indices, and note that from_vector_store() cannot be

ddhiraj

load_index_from_storage vs VectorStoreIndex.from_vector_store() -> does both of these do the same thing?

Because when I see the print the index.docstore.docs from the index returned by load_index_from_storage , I can see the documents, but same is not the case when I do it for the index returned from VectorStoreIndex.from_vector_store()

10 comments

LLogan M

from_vector_store() is meant for when you are using a vector db integration. By default, vector db integrations store all data in the vector db to simplify storage. This means no docstore

LLogan M

If you want a docstore, you should manually add to it and persist it. You can also override the disabling by using VectorStoreIndex(..., store_nodes_override=True), but then you need to handle persiting the storage context

ddhiraj

so in case of storing multiple indicies how should one proceed, if one wants to use vector db integration?

ddhiraj

I didn't understand this point.

LLogan M

Each index is its own collection/namespace in your vector db. So you can seperate like that and use from_vector_store

If you absolutely need index.docstore.docs you can set the override, and then persist/load as needed

Plain Text

VectorStoreIndex(..., storage_context=storage_context, store_nodes_override=True)
index.storage_context.persist(persist_dir=",/storage)
...
ctx = StorageContext.from_defaults(vector_store=vector_store, persist_dir="./storage")
index = load_index_from_storage(ctx)

ddhiraj

okay, so in case of storing multiple index and then later retrieving them from the "vector db" will not be possible using VectorStoreIndex.from_vector_store() right?

LLogan M

I'm not sure what you mean by retrieving an index? Do you mean the documents/nodes? You can always get the index using from_vector_store() -- it recreates the index and you can query/retrieve

ddhiraj

ok, let me explain the problem -
I am creating two different index over the same data, one is the VectorStoreIndex and another one is SummaryIndex and I am using Qdrant vector store to store them. Here is the code -

Plain Text

ingest.py
 client = QdrantClient(path="./vector_store/db")
qdrant_vector_store = QdrantVectorStore(client= client,
                                 collection_name="index-collection")
qdrant_storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

# generate embeddings and set the index-id
vector_store_index = VectorStoreIndex.from_documents(docs,
                                                     storage_context=qdrant_storage_context)
vector_store_index.set_index_id("vector-store-index")

summary_index = SummaryIndex.from_documents(documents=docs,
                                            storage_context=qdrant_storage_context) 
summary_index.set_index_id("summary-index")

# Storing multiple indicies in one place
qdrant_storage_context.persist(persist_dir="./vector_store/db")

The above will be the Ingestion Part
Now in the Chat or Query Part, I need to retrieve the stored indicies and create the RouterQueryEngine.
I am doing this like below -

Plain Text

query.py
client = QdrantClient(path="./vector_store/db")
qdrant_vector_store = QdrantVectorStore(client= client,
                                 collection_name="index-collection")
qdrant_storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

# load from the persisted data
from llama_index.core import load_index_from_storage, load_indices_from_storage
# create storage context
loaded_indices = load_indices_from_storage(storage_context=qdrant_storage_context,
                                           index_ids=['vector-store-index', 'summary-index'])

ddhiraj

Can I replace load_indices_from_storage with from_vector_store()?

LLogan M

Not for the summary index, since the summary index does use a vector store

Add a reply

Find answers from the community

`load_index_from_storage` vs