I am implementing an application where I

At a glance

The community member is implementing an application that uses Qdrant as a persistent database, but is encountering issues when using an automerging method to split the documents. When loading the database and performing queries, they receive an error indicating that a document ID is not found. The issue seems to be related to the auto_merging_retriever.py file, specifically line 154 where the _try_merging method is called.

The comments suggest that Qdrant only stores vectors and their nodes, but the auto merging retriever relies on additional data in a persisted docstore. When using a vector store integration like Qdrant, the docstore is disabled to simplify storage, but the community member can override this and store the docstore elsewhere (e.g., on disk, or using Redis or MongoDB).

The community member provides some example code for creating a storage context, saving the docstore/index store locally, and loading the index. However, another community member notes that this example does not store the documents. The community member then shares their own implementation, which includes creating a DocumentStore and adding the documents and nodes to it. They are still unable to load the data back successfully.

The community member finds a possible solution by setting the docstore in the storage context when loading the index from the vector store, but they are

SSMN

I am implementing an application where I want to use Qdrant as a persistent database. However, when I use an automerging method to split the documents, I am then unable to properly save the data. When I load the database and perform queries, I get an error: 'raise ValueError(f"doc_id {doc_id} not found.") ValueError: doc_id 199bc490-2969-4b2f-be85-71c0d29a078b not found.' The issue is that in 'auto_merging_retriever.py', line 154, in _try_merging nodes, is_changed_1 = self._get_parents_and_merge(nodes)'. Any suggestion on how i can properly use Qdrant? thanks

14 comments

LLogan M

qdrant only stores vectors + their nodes. But the auto merging retriever relies on additional data in a persisted docstore.

When you use a vector store integration like qdrant, the docstore is disabled to simplify storage. You can override this, but now you need to put the docstore somewhere (to disk, or use the redis or mongodb docstores)

VectorStoreIndex.from_documents(,,,, storage_context=storage_context, store_nodes_override=True)

SSMN

thanks, do you have a working example reference?

LLogan M

Plain Text

storage_context = StorageContext.from_defaults(vector_store=vector_store)

# create index
index = VectorStoreIndex.from_documents(
  documents, 
  storage_context=storage_context, 
  service_context=service_context, 
  store_nodes_override=True
)

# save the docstore/index store locally
index.storage_context.persist(persist_dir="./storage")

# load the index
new_storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir="./storage")

from llama_index import load_index_from_storage
# optional service context
loaded_index = load_index_from_storage(new_storage_context, service_context=service_context)

SSMN

but here you are not storing the documents

SSMN

my writing implementations is:

Plain Text

 
client = QdrantClient(
        path=config.persist_directory
    )

    vector_store = QdrantVectorStore(
        client=client,
        collection_name=config.collection
    )

    if automerge_documents:
        doc_store = DocumentStore()
    else:
        doc_store = None

    storage_context = StorageContext.from_defaults(
        docstore=doc_store,
        vector_store=vector_store
    )

    index = VectorStoreIndex.from_documents(
        documents,
        storage_context=storage_context,
        service_context=service_context,
        show_progress=True,
        store_nodes_override=automerge_documents
    )

    if automerge_documents:
        chunk_sizes = config.chunk_sizes or [2048, 512, 128]
        node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
        nodes = node_parser.get_nodes_from_documents(documents)
        # leaf_nodes = get_leaf_nodes(nodes)
        # storage_context.docstore.add_documents(nodes)
        index.build_index_from_nodes(nodes=nodes)

        # index.docstore.add_documents(nodes)
        doc_store.add_documents(nodes)
        doc_store.add_documents(documents)

        index.docstore.persist(persist_path=os.path.join(config.persist_directory, "docstore.json"))

SSMN

but i'm not able to load back

SSMN

rn is the followig:

Plain Text

        if config.persist_directory:
            if automerge_documents:
                doc_store = DocumentStore.from_persist_path(
                    persist_path=os.path.join(config.persist_directory, "docstore.json")
                )
            else:
                doc_store = None

            # load the database
            logging.info(f"loading local database...")

            client = QdrantClient(
                path=config.persist_directory,
            )

            vector_store = QdrantVectorStore(
                client=client,
                collection_name=config.collection
            )

            index = VectorStoreIndex.from_vector_store(
                vector_store,
                doc_store=doc_store,
                service_context=service_context,
            )

SSMN

i found a possible solution but it seams to be an issue

SSMN

when in 'reading mode' i create VectorStoreIndex.from_vector_store( ) i'm also passing the doc_store. however, internally, a storage context is created with a totally new docstore. to solve this, i added this line:

Plain Text

index.storage_context.docstore = doc_store

SSMN

i think this ins not so clean, what do you think @Logan M ?

LLogan M

Calling from_documents() will store the documents in the dostore, because I have store_nodes_override=True

LLogan M

Yea you can't use from_vector_store now with this method.

Need to do loaded_index = load_index_from_storage(new_storage_context, service_context=service_context) most likely

LLogan M

The above fix you suggest also works though

SSMN

thanks!

Add a reply

Find answers from the community

I am implementing an application where I