Find answers from the community

Updated 4 months ago

I am implementing an application where I

At a glance
I am implementing an application where I want to use Qdrant as a persistent database. However, when I use an automerging method to split the documents, I am then unable to properly save the data. When I load the database and perform queries, I get an error: 'raise ValueError(f"doc_id {doc_id} not found.") ValueError: doc_id 199bc490-2969-4b2f-be85-71c0d29a078b not found.' The issue is that in 'auto_merging_retriever.py', line 154, in _try_merging nodes, is_changed_1 = self._get_parents_and_merge(nodes)'. Any suggestion on how i can properly use Qdrant? thanks
L
S
14 comments
qdrant only stores vectors + their nodes. But the auto merging retriever relies on additional data in a persisted docstore.

When you use a vector store integration like qdrant, the docstore is disabled to simplify storage. You can override this, but now you need to put the docstore somewhere (to disk, or use the redis or mongodb docstores)

VectorStoreIndex.from_documents(,,,, storage_context=storage_context, store_nodes_override=True)
thanks, do you have a working example reference?
Plain Text
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# create index
index = VectorStoreIndex.from_documents(
  documents, 
  storage_context=storage_context, 
  service_context=service_context, 
  store_nodes_override=True
)

# save the docstore/index store locally
index.storage_context.persist(persist_dir="./storage")

# load the index
new_storage_context = StorageContext.from_defaults(vector_store=vector_store, persist_dir="./storage")

from llama_index import load_index_from_storage
# optional service context
loaded_index = load_index_from_storage(new_storage_context, service_context=service_context)
but here you are not storing the documents
my writing implementations is:
Plain Text
 
client = QdrantClient(
        path=config.persist_directory
    )

    vector_store = QdrantVectorStore(
        client=client,
        collection_name=config.collection
    )

    if automerge_documents:
        doc_store = DocumentStore()
    else:
        doc_store = None

    storage_context = StorageContext.from_defaults(
        docstore=doc_store,
        vector_store=vector_store
    )

    index = VectorStoreIndex.from_documents(
        documents,
        storage_context=storage_context,
        service_context=service_context,
        show_progress=True,
        store_nodes_override=automerge_documents
    )

    if automerge_documents:
        chunk_sizes = config.chunk_sizes or [2048, 512, 128]
        node_parser = HierarchicalNodeParser.from_defaults(chunk_sizes=chunk_sizes)
        nodes = node_parser.get_nodes_from_documents(documents)
        # leaf_nodes = get_leaf_nodes(nodes)
        # storage_context.docstore.add_documents(nodes)
        index.build_index_from_nodes(nodes=nodes)

        # index.docstore.add_documents(nodes)
        doc_store.add_documents(nodes)
        doc_store.add_documents(documents)

        index.docstore.persist(persist_path=os.path.join(config.persist_directory, "docstore.json"))
but i'm not able to load back
rn is the followig:
Plain Text
        if config.persist_directory:
            if automerge_documents:
                doc_store = DocumentStore.from_persist_path(
                    persist_path=os.path.join(config.persist_directory, "docstore.json")
                )
            else:
                doc_store = None

            # load the database
            logging.info(f"loading local database...")

            client = QdrantClient(
                path=config.persist_directory,
            )

            vector_store = QdrantVectorStore(
                client=client,
                collection_name=config.collection
            )

            index = VectorStoreIndex.from_vector_store(
                vector_store,
                doc_store=doc_store,
                service_context=service_context,
            )
i found a possible solution but it seams to be an issue
when in 'reading mode' i create VectorStoreIndex.from_vector_store( ) i'm also passing the doc_store. however, internally, a storage context is created with a totally new docstore. to solve this, i added this line:
Plain Text
index.storage_context.docstore = doc_store
i think this ins not so clean, what do you think @Logan M ?
Calling from_documents() will store the documents in the dostore, because I have store_nodes_override=True
Yea you can't use from_vector_store now with this method.

Need to do loaded_index = load_index_from_storage(new_storage_context, service_context=service_context) most likely
The above fix you suggest also works though
Add a reply
Sign up and join the conversation on Discord