Find answers from the community

Updated 8 months ago

`load_index_from_storage` vs

load_index_from_storage vs VectorStoreIndex.from_vector_store() -> does both of these do the same thing?

Because when I see the print the index.docstore.docs from the index returned by load_index_from_storage , I can see the documents, but same is not the case when I do it for the index returned from VectorStoreIndex.from_vector_store()
L
d
10 comments
from_vector_store() is meant for when you are using a vector db integration. By default, vector db integrations store all data in the vector db to simplify storage. This means no docstore
If you want a docstore, you should manually add to it and persist it. You can also override the disabling by using VectorStoreIndex(..., store_nodes_override=True), but then you need to handle persiting the storage context
so in case of storing multiple indicies how should one proceed, if one wants to use vector db integration?
I didn't understand this point.
Each index is its own collection/namespace in your vector db. So you can seperate like that and use from_vector_store

If you absolutely need index.docstore.docs you can set the override, and then persist/load as needed

Plain Text
VectorStoreIndex(..., storage_context=storage_context, store_nodes_override=True)
index.storage_context.persist(persist_dir=",/storage)
...
ctx = StorageContext.from_defaults(vector_store=vector_store, persist_dir="./storage")
index = load_index_from_storage(ctx)
okay, so in case of storing multiple index and then later retrieving them from the "vector db" will not be possible using VectorStoreIndex.from_vector_store() right?
I'm not sure what you mean by retrieving an index? Do you mean the documents/nodes? You can always get the index using from_vector_store() -- it recreates the index and you can query/retrieve
ok, let me explain the problem -
I am creating two different index over the same data, one is the VectorStoreIndex and another one is SummaryIndex and I am using Qdrant vector store to store them. Here is the code -
Plain Text
ingest.py
 client = QdrantClient(path="./vector_store/db")
qdrant_vector_store = QdrantVectorStore(client= client,
                                 collection_name="index-collection")
qdrant_storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

# generate embeddings and set the index-id
vector_store_index = VectorStoreIndex.from_documents(docs,
                                                     storage_context=qdrant_storage_context)
vector_store_index.set_index_id("vector-store-index")

summary_index = SummaryIndex.from_documents(documents=docs,
                                            storage_context=qdrant_storage_context) 
summary_index.set_index_id("summary-index")

# Storing multiple indicies in one place
qdrant_storage_context.persist(persist_dir="./vector_store/db")

The above will be the Ingestion Part
Now in the Chat or Query Part, I need to retrieve the stored indicies and create the RouterQueryEngine.
I am doing this like below -
Plain Text
query.py
client = QdrantClient(path="./vector_store/db")
qdrant_vector_store = QdrantVectorStore(client= client,
                                 collection_name="index-collection")
qdrant_storage_context = StorageContext.from_defaults(vector_store=qdrant_vector_store)

# load from the persisted data
from llama_index.core import load_index_from_storage, load_indices_from_storage
# create storage context
loaded_indices = load_indices_from_storage(storage_context=qdrant_storage_context,
                                           index_ids=['vector-store-index', 'summary-index'])
Can I replace load_indices_from_storage with from_vector_store()?
Not for the summary index, since the summary index does use a vector store
Add a reply
Sign up and join the conversation on Discord