`

nnoequal


    def _delete(self, doc_id: str, **delete_kwargs: Any) -> None:
        """Delete a document."""
        self._index_struct.delete(doc_id)
        self._vector_store.delete(doc_id)
    
    def delete(self, doc_id: str) -> None:
        """Delete a Node."""
        if doc_id not in self.doc_id_dict:
            raise ValueError("doc_id not found in doc_id_dict")
        for vector_id in self.doc_id_dict[doc_id]:
            del self.nodes_dict[vector_id]
        del self.doc_id_dict[doc_id

I don't think an exception should be thrown when doc_id is not in self.doc_id_dict, because it's possible that my index is constructed like this:



return GPTQdrantIndex(

nodes=[],

client=qdrant_client_instance,

service_context=service_context,

collection_name=self._collection_name,

)

I only use the client to operate on the index without loading data into memory. This used to work fine in older versions.

11 comments

LLogan M

I agree, I don't see the harm in just returning instead of raising an error

LLogan M

https://github.com/jerryjliu/llama_index/pull/1071

nnoequal

Thank you very much for solving this problem. I have another question to ask you.

LLogan M

whats up 👀

nnoequal

I have two documents, one about EIP-1 and the other about EIP-2. They are stored using qdrant. When I query for EIP-1, I always get results for EIP-2 instead. It seems to be due to the embedding similarity not being clear enough. Do you have any good solutions?

nnoequal

i use the default embedding model

LLogan M

You could look into using requried_keywords or exclude_keywords in your query 🤔

Also, maybe you can pre-split the documents into well defined sections before indexing?

Or, maybe you can try creating a composable index, like a vector index for each document and then a top-level index (but this will increase latency a bit)

nnoequal

I have a lot of documents with various types, and I feel it's difficult to build a reasonable composable index. In addition, the query input is not always consistent, making it hard to automatically generate key words.

LLogan M

Totally agree!

I think at this point its a limitation of emeddings 🤔 my only further suggestion is increasing similarity_top_k (and using response_mode="compact" to keep response times reasonable), and maybe playing with chunk_size_limit

nnoequal

Thanks, I'll try this, but these are really hard to weigh🫠

LLogan M

Yea 🤔 it's definitely tricky!

Lots of things to try though!

Add a reply

Find answers from the community

`