hi team when loading an index with

At a glance

hi team, when loading an index with storage context after storage context already contains the docstore why does it need to be called with nodes? aren't the nodes already in the docstore and subsequently in the storage_context?

from llama_index.docstore import MongoDocumentStore
from llama_index.node_parser import SimpleNodeParser

create parser and parse document into nodes

parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

create (or load) docstore and add nodes

docstore = MongoDocumentStore.from_uri(uri="")
docstore.add_documents(nodes)

create storage context

storage_context = StorageContext.from_defaults(docstore=docstore)

build index

index = GPTVectorStoreIndex(nodes, storage_context=storage_context)

8 comments

ddisiok

this is because we have use separate doctore and vector_store to keep text and embedding

ddisiok

the last line is necessary to compute embeddings and store them into the vector store

sshere

is there a way to save the nodes? essentially i want to make sure i'm now overwriting the old nodes when i'm adding a new node/file

ddisiok

the nodes are saved to docstore when you call docstore.add_documents(nodes)

ddisiok

and the embedding of the nodes are saved to the vector store when you construct the vector store index

ddisiok

adding new nodes don't overwrite old ones

sshere

@disiok for generating multiple indices, when i then persist the storage context after generating these 2 indexes the doc_store contains the same document twice. Am i doing something wrong?
list_index = GPTListIndex(nodes, storage_context=storage_context, service_context=service_context)
list_index.set_index_id("List Index")
vector_index = GPTVectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)
vector_index.set_index_id("Vector Index")

ddisiok

Hmm that's surprising. shouldn't be the case

Add a reply

Find answers from the community

hi team when loading an index with

create parser and parse document into nodes

create (or load) docstore and add nodes

create storage context

build index