Find answers from the community

Updated 2 years ago

Hi all

Hi all.

I have an api endpoint that does the following:

Plain Text
    ...
    parser = SimpleNodeParser()
    nodes = parser.get_nodes_from_documents(documents)

    # create (or load) docstore and add nodes
    docstore = MongoDocumentStore.from_uri(uri=URI, db_name='db', namespace='public')
    docstore.add_documents(nodes)

    # create storage context
    storage_context = StorageContext.from_defaults(
        docstore=docstore,
        index_store=MongoIndexStore.from_uri(uri=URI, db_name='db', namespace='public')
    )

    # build index
    index = GPTVectorStoreIndex(nodes, storage_context=storage_context)


On a separate endpoint, I wanna load the index and query it:

Plain Text
    storage_context = StorageContext.from_defaults(
        docstore=MongoDocumentStore.from_uri(uri=URI, db_name='db', namespace='public'),
        index_store=MongoIndexStore.from_uri(uri=URI, db_name='db', namespace='public')
    )
    index = load_index_from_storage(storage_context)
    query_engine = index.as_query_engine()
    result = query_engine.query(data.prompt)


this errors out with KeyError('1').

I believe that stems from a method on the storage context that load_index_from_storage calls deep in the code: storage_context.index_store.index_structs().
This is what errors out with KeyError('1'); I take it that means it couldn't find an index/index_struct? But https://gpt-index.readthedocs.io/en/latest/how_to/storage/index_stores.html says that, if using MongoDBIndexStore, you don't have to persist storage.

am I loading the index correctly? Am I fundamentally doing something wrong?
d
p
5 comments
hey @paragoniq there are 2 things wrong (mostly our fault for not being clear):
  1. using the same namespace for docstore and index store for MongoDB would clash
  2. right now it's still using the default SimpleVectorStore, which stores embeddings in memory unless persisted (so in your example, those embeddings would be lost)
To resolve 2, you can either
  1. persist storage_context.vector_store to disk and explicitly reload it
  2. or use any hosted vector store e.g. Pinecone
thank you! much obliged
that worked, thanks.
Add a reply
Sign up and join the conversation on Discord