Loading from storage

At a glance

The post is about an AttributeError related to the 'index_structs' attribute of a 'dict' object. Community members discuss the issue, with one suggesting providing the full stack trace to better understand the problem. Another community member shares a working example using the GPTListIndex and SimpleDirectoryReader classes from the llama_index library.

The discussion then focuses on reusing nodes across multiple index structures and persisting multiple indices to the same storage. A community member notes that the documentation covers reusing nodes, but does not explain how to load from storage. Another community member suggests persisting each index separately, unless the indexes are very large. The maintainer of the llama_index library then provides guidance on reusing nodes and persisting multiple indices to the same storage, including a link to a relevant notebook example.

Useful resources

AAli | Tali AI

Plain Text

AttributeError: 'dict' object has no attribute 'index_structs'

8 comments

LLogan M

What's the full stack trace? Easier to see what's going on with that

LLogan M

Actually I'll also make a quick example that works, maybe that will help to align with what's going on

LLogan M

This flow seems to work fine for me (v0.6.1)

Plain Text

>>> from llama_index import GPTListIndex, SimpleDirectoryReader
>>> documents = SimpleDirectoryReader("./data").load_data()
>>> index = GPTListIndex.from_documents(documents)
>>> index.storage_context.persist(persist_dir="./my_index")
>>> import os
>>> os.listdir("./my_index")
['docstore.json', 'index_store.json', 'vector_store.json']
>>> from llama_index import StorageContext, load_index_from_storage
>>> storage_context = StorageContext.from_defaults(persist_dir="./my_index")
>>> new_index = load_index_from_storage(storage_context)
>>>

jjerome

I would like to reuse the node across multiple index

Plain Text

     
index1 = GPTVectorStoreIndex(nodes,
service_context=service_context) 
index2 = GPTListIndex(nodes,
service_context=service_context)

How do I get the nodes from the storage_context?

LLogan M

There's a section in the docs for reusing nodes 👍

https://gpt-index.readthedocs.io/en/stable/guides/primer/usage_pattern.html#reusing-nodes-across-index-structures

jjerome

yes but it does not explain how to load from storage. I have a firs step where I load, parse and store finishing by

Plain Text

    storage_context_node.persist(persist_dir="./storage/" + directory)

then I want to reuse this stored info to query from a slack bot or a web bot or a cli bot and so I need to load from the storage, recreate the nodes to create then the index for each bot

LLogan M

Hmm, tried a few things. Thought I had it but then the index_struct was empty 🙃

This probably needs some better UX. @jerryjliu0 Is it possible to call persist but then use that across different index types? Or something similar? Doesn't seem so straightforward at the moment 🤔 Seems like you always need the original nodes or documents in the examples.

@jerome I would just persist each index separately at the moment. Shouldn't be a huge deal at the moment unless your indexes are many GBs

jjerryjliu0

@jerome you can reuse nodes from index structures, and also persist different indexes to the same storage. @Logan M i agree that we could make this more clear

Regarding reusing nodes, check this out: https://gpt-index.readthedocs.io/en/latest/guides/primer/usage_pattern.html#reusing-nodes-across-index-structures
Regarding persisting multiple indices to the same storage, take a look at this notebook: https://github.com/jerryjliu/llama_index/blob/main/docs/examples/vector_stores/SimpleIndexDemo.ipynb. Note that you should explicitly set the index id

Add a reply

Find answers from the community

Loading from storage