Could you please guide me on how to

At a glance

Could you please guide me on how to persist multiple vector indices within the storage context?

11 comments

Plain Text

storage_context = StorageContext.from_defaults()

index1 = VectorStoreIndex.from_documents(documents1, storage_context=storage_context)
index1.set_index_id("index_1")

index2 = VectorStoreIndex.from_documents(documents2, storage_context=storage_context)
index2.set_index_id("index_2")

storage_context.persist(persist_dir="./storage")

from llama_index import load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./storage")
index1 = load_index_from_storage(storage_context, index_id="index_1")

YYyh

Thank you for your valuable suggestions. I have made the following attempts, but I found that because it goes through 'default', I have to rewrite the vector store every time. Is there any best practice for this?

Plain Text

storage_context.add_vector_store(storage_context.vector_stores["index_1"], "default")
index1 = VectorStoreIndex(format_nodes, storage_context=storage_context, service_context=service_context)
index1.set_index_id("index_1")

for node in format_nodes:
    node.text_template = '{metadata_str}'

storage_context.add_vector_store(storage_context.vector_stores["index_2"], "default")
index2 = VectorStoreIndex(format_nodes, storage_context=storage_context, service_context=service_context)
index2.set_index_id("index_2")

del storage_context.vector_stores["default"]
storage_context.persist("./storage")

LLogan M

You don't need to explicitly add the vector store or delete anything. Just a single storage context that is used for both indexes

The example I gave above works out of the box no? Looking at your code, I might do this

Plain Text

# create
storage_context = StorageContext.from_defaults()

index1 = VectorStoreIndex(format_nodes, storage_context=storage_context, service_context=service_context)
index1.set_index_id("index_1")

for node in format_nodes:
    node.text_template = '{metadata_str}'

index2 = VectorStoreIndex(format_nodes, storage_context=storage_context, service_context=service_context)
index2.set_index_id("index_2")

# save
storage_context.persist(persist_dir="./storage")

# load
from llama_index import load_index_from_storage

storage_context = StorageContext.from_defaults(persist_dir="./storage")
index1 = load_index_from_storage(storage_context, index_id="index_1")
index2 = load_index_from_storage(storage_context, index_id="index_2")

YYyh

I ran the code as you suggested, but I'm having an issue with persistence. The vectors for index2 are overwriting the vectors for index1. As a result, when I execute:

index1 = load_index_from_storage(storage_context, index_id="index_1")
index2 = load_index_from_storage(storage_context, index_id="index_2")
I end up with both index1 and index2 containing the vectors from index2.

LLogan M

🤔 they should contain the vectors for both indexes in the vector store, and the index store is used to keep track of which vectors belong to which index.

I'll take a look in the morning I suppose. But I think we even have a unit test for this lol

YYyh

Thanks for the quick reply. I'm not sure what the unit tests cover, but there's a particular aspect of my case that might be causing the issue: my multiple vector indexes are derived from the same set of node_ids. It's possible that this is leading to the vectors being overwritten. Could this be a factor?

Looking forward to your insights in the morning.

LLogan M

Ohh... that's 100% the issue

Try creating a new set of nodes for the second vector store maybe?

YYyh

so it's the shared node_ids that are tripping things up. I was hoping to avoid having to duplicate nodes since I'm just tweaking the text_template to index different features of the same nodes.

Creating a whole new set of nodes each time feels like a lot of extra work. Any chance there's a workaround that lets me stick with one set of nodes but still keep the indices from stepping on each other's toes?

LLogan M

hmm, everything in the storage context uses node_ids as the key. So sharing them between two indexes, in the same storage context, will continue to cause issues

LLogan M

different features of the same nodes -- its effectively a new node tbh though

YYyh

thanks for the heads up, Logan. I'll keep that in mind. Cheers!

Add a reply

Find answers from the community

Could you please guide me on how to