Find answers from the community

Updated 5 months ago

I am implementing an application where I

At a glance

The community member is implementing an application that uses Qdrant as a persistent database, but is encountering issues when using an automerging method to split the documents. When loading the database and performing queries, they receive an error indicating that a document ID is not found. The issue seems to be related to the auto_merging_retriever.py file, specifically line 154 where the _try_merging method is called.

The comments suggest that Qdrant only stores vectors and their nodes, but the auto merging retriever relies on additional data in a persisted docstore. When using a vector store integration like Qdrant, the docstore is disabled to simplify storage, but the community member can override this and store the docstore elsewhere (e.g., on disk, or using Redis or MongoDB).

The community member provides some example code for creating a storage context, saving the docstore/index store locally, and loading the index. However, another community member notes that this example does not store the documents. The community member then shares their own implementation, which includes creating a DocumentStore and adding the documents and nodes to it. They are still unable to load the data back successfully.

The community member finds a possible solution by setting the docstore in the storage context when loading the index from the vector store, but they are

I am implementing an application where I want to use Qdrant as a persistent database. However, when I use an automerging method to split the documents, I am then unable to properly save the data. When I load the database and perform queries, I get an error: 'raise ValueError(f"doc_id {doc_id} not found.") ValueError: doc_id 199bc490-2969-4b2f-be85-71c0d29a078b not found.' The issue is that in 'auto_merging_retriever.py', line 154, in _try_merging nodes, is_changed_1 = self._get_parents_and_merge(nodes)'. Any suggestion on how i can properly use Qdrant? thanks
L
S
14 comments
qdrant only stores vectors + their nodes. But the auto merging retriever relies on additional data in a persisted docstore.

When you use a vector store integration like qdrant, the docstore is disabled to simplify storage. You can override this, but now you need to put the docstore somewhere (to disk, or use the redis or mongodb docstores)

VectorStoreIndex.from_documents(,,,, storage_context=storage_context, store_nodes_override=True)
thanks, do you have a working example reference?
Plain Text
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# create index
index = VectorStoreIndex.from_documents(
  documents, 
  storage_context=storage_context, 
  service_context=service_context, 
  store_nodes_override=True
)

# save the docstore/index store locally
index.storage_context.persist(persist_dir="./storage")

# load the index
new_storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir="./storage")

from llama_index import load_index_from_storage
# optional service context
loaded_index = load_index_from_storage(new_storage_context, service_context=service_context)
but here you are not storing the documents
my writing implementations is:
Plain Text
 
client = QdrantClient(
        path=config.persist_directory
    )

    vector_store = QdrantVectorStore(
        client=client,
        collection_name=config.collection
    )

    if automerge_documents:
        doc_store = DocumentStore()
    else:
        doc_store = None

    storage_context = StorageContext.from_defaults(
        docstore=doc_store,
        vector_store=vector_store
    )

    index = VectorStoreIndex.from_documents(
        documents,
        storage_context=storage_context,
        service_context=service_context,
        show_progress=True,
        store_nodes_override=automerge_documents
    )

    if automerge_documents:
        chunk_sizes = config.chunk_sizes or [2048, 512, 128]
        node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
        nodes = node_parser.get_nodes_from_documents(documents)
        # leaf_nodes = get_leaf_nodes(nodes)
        # storage_context.docstore.add_documents(nodes)
        index.build_index_from_nodes(nodes=nodes)

        # index.docstore.add_documents(nodes)
        doc_store.add_documents(nodes)
        doc_store.add_documents(documents)

        index.docstore.persist(persist_path=os.path.join(config.persist_directory, "docstore.json"))
but i'm not able to load back
rn is the followig:
Plain Text
        if config.persist_directory:
            if automerge_documents:
                doc_store = DocumentStore.from_persist_path(
                    persist_path=os.path.join(config.persist_directory, "docstore.json")
                )
            else:
                doc_store = None

            # load the database
            logging.info(f"loading local database...")

            client = QdrantClient(
                path=config.persist_directory,
            )

            vector_store = QdrantVectorStore(
                client=client,
                collection_name=config.collection
            )

            index = VectorStoreIndex.from_vector_store(
                vector_store,
                doc_store=doc_store,
                service_context=service_context,
            )
i found a possible solution but it seams to be an issue
when in 'reading mode' i create VectorStoreIndex.from_vector_store( ) i'm also passing the doc_store. however, internally, a storage context is created with a totally new docstore. to solve this, i added this line:
Plain Text
index.storage_context.docstore = doc_store
i think this ins not so clean, what do you think @Logan M ?
Calling from_documents() will store the documents in the dostore, because I have store_nodes_override=True
Yea you can't use from_vector_store now with this method.

Need to do loaded_index = load_index_from_storage(new_storage_context, service_context=service_context) most likely
The above fix you suggest also works though
Add a reply
Sign up and join the conversation on Discord