Find answers from the community

Updated 3 months ago

`

def _delete(self, doc_id: str, **delete_kwargs: Any) -> None: """Delete a document.""" self._index_struct.delete(doc_id) self._vector_store.delete(doc_id) def delete(self, doc_id: str) -> None: """Delete a Node.""" if doc_id not in self.doc_id_dict: raise ValueError("doc_id not found in doc_id_dict") for vector_id in self.doc_id_dict[doc_id]: del self.nodes_dict[vector_id] del self.doc_id_dict[doc_id
I don't think an exception should be thrown when doc_id is not in self.doc_id_dict, because it's possible that my index is constructed like this:
return GPTQdrantIndex( nodes=[], client=qdrant_client_instance, service_context=service_context, collection_name=self._collection_name, )

I only use the client to operate on the index without loading data into memory. This used to work fine in older versions.
L
n
11 comments
I agree, I don't see the harm in just returning instead of raising an error
Thank you very much for solving this problem. I have another question to ask you.
whats up πŸ‘€
I have two documents, one about EIP-1 and the other about EIP-2. They are stored using qdrant. When I query for EIP-1, I always get results for EIP-2 instead. It seems to be due to the embedding similarity not being clear enough. Do you have any good solutions?
i use the default embedding model
You could look into using requried_keywords or exclude_keywords in your query πŸ€”

Also, maybe you can pre-split the documents into well defined sections before indexing?

Or, maybe you can try creating a composable index, like a vector index for each document and then a top-level index (but this will increase latency a bit)
I have a lot of documents with various types, and I feel it's difficult to build a reasonable composable index. In addition, the query input is not always consistent, making it hard to automatically generate key words.
Totally agree!

I think at this point its a limitation of emeddings πŸ€” my only further suggestion is increasing similarity_top_k (and using response_mode="compact" to keep response times reasonable), and maybe playing with chunk_size_limit
Thanks, I'll try this, but these are really hard to weigh🫠
Yea πŸ€” it's definitely tricky!

Lots of things to try though!
Add a reply
Sign up and join the conversation on Discord