Find answers from the community

Updated 2 years ago

hi team when loading an index with

At a glance
hi team, when loading an index with storage context after storage context already contains the docstore why does it need to be called with nodes? aren't the nodes already in the docstore and subsequently in the storage_context?

from llama_index.docstore import MongoDocumentStore
from llama_index.node_parser import SimpleNodeParser

create parser and parse document into nodes

parser = SimpleNodeParser()
nodes = parser.get_nodes_from_documents(documents)

create (or load) docstore and add nodes

docstore = MongoDocumentStore.from_uri(uri="")
docstore.add_documents(nodes)

create storage context

storage_context = StorageContext.from_defaults(docstore=docstore)

build index

index = GPTVectorStoreIndex(nodes, storage_context=storage_context)
d
s
8 comments
this is because we have use separate doctore and vector_store to keep text and embedding
the last line is necessary to compute embeddings and store them into the vector store
is there a way to save the nodes? essentially i want to make sure i'm now overwriting the old nodes when i'm adding a new node/file
the nodes are saved to docstore when you call docstore.add_documents(nodes)
and the embedding of the nodes are saved to the vector store when you construct the vector store index
adding new nodes don't overwrite old ones
@disiok for generating multiple indices, when i then persist the storage context after generating these 2 indexes the doc_store contains the same document twice. Am i doing something wrong?
list_index = GPTListIndex(nodes, storage_context=storage_context, service_context=service_context)
list_index.set_index_id("List Index")
vector_index = GPTVectorStoreIndex(nodes, storage_context=storage_context, service_context=service_context)
vector_index.set_index_id("Vector Index")
Hmm that's surprising. shouldn't be the case
Add a reply
Sign up and join the conversation on Discord