Find answers from the community

Updated 5 months ago

Hi All, I'm implementing the auto-

At a glance
Hi All, I'm implementing the auto-merging retriever and i was wondering if it's possible to use a vector store as the document store too? I've been setting the storage context like this
Plain Text
storage_context = StorageContext.from_defaults(vector_store=vector_store)

but whilst the leaf nodes are persisted in the vector_store it doesn't seem to be including the documents (and heirarchy) like it does if I persist them locally.

Perhaps what I'm trying to do isn't supported? It seems odd to me that you wouldn't be able to use your vector store as your docstore too though.
B
L
E
5 comments
I don't know if this is what you are looking for, but I am playing with storage too
Plain Text
   # docs
    from llama_index import SimpleDirectoryReader
    documents = SimpleDirectoryReader("/RAG_VectorDB/test/").load_data()
    print("Document ID:", documents[0].doc_id)

    print('index')
    index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
    
  #index = VectorStoreIndex.from_vector_store(vector_store, service_context=service_context, storage_context=storage_context)

    # save vector
    #for doc in documents:
    #    index.insert(doc)
    #index.storage_context.persist()

You can load the documents with the first index function, also save them in the database with the lower function.
I am currently using milvus, for me this last function creates entities, but after restart they are gone, but I hope this might help you a bit
@Ed yea basically, the vector store cannot be a docstore, so vector stores do not provide the required interfaces to act as a docstore.

You can use any docstore (SimpleDocumentStore, RedisDocumentStore, MongoDocumentStore) and persist and load the data that way.

i.e.

Plain Text
docstore = SimpleDocumentStore()
storage_context = StorageContext.from_defaults(docstore=docstore, vector_store=vector_store)

index = VectorStoreIndex(..., store_nodes_override=True)


The store nodes override will make it so that the docstore is populated as the index builds
thanks guys, that's really helpful. In theory there's no reason why you can't use the metadata fields in vector stores for the document store too (and just ignore the vector payloads). I appreciate why it wouldn't be recommended though.
Yea it's a bit hacky/abuses the vector db. Plus there's some methods (like get_all()) that some vector dbs don't support

Having a raw key-value interface is a bit more useful
yep makes a lot of sense. I really appreciate the replies on this. Thanks πŸ‘
Add a reply
Sign up and join the conversation on Discord