Find answers from the community

Home
Members
kokonutoil
k
kokonutoil
Offline, last seen 3 months ago
Joined September 25, 2024
Hey guys! Does anyone have any suggestions on how to add multiple documents to a simple vector index in parallel?
9 comments
L
k
Hey guys! I asked this on GH, but I think I may have better luck here :).

I'm unable to delete documents from an index in llama-index after updating to version 0.5.2. All I'm doing is:
  1. Breaking a document into smaller documents, all with the same document ID (doc_id)
  2. Calling index.insert(d) on each of these smaller documents
  3. Saving the index to a string and then to my local file system
  4. Loading the index from my file system
  5. Calling index.delete(doc_id)
Before updating, this would delete every document with doc_id from the index. Now, it doesn't seem to do anything. The file size is slightly smaller after deleting, but a majority of the data is still there. I've attached before and after files for an example index.
5 comments
k
L
Another question: What encoding does index.save_to_string() and index.load_from_string() default to?
5 comments
L
k
Hey guys! I noticed that when there are llama-index updates, the following error is pretty common when trying to query an index:
Plain Text
KeyError: '__type__'[ERROR] KeyError: '__type__'
...
File "/home/app/llama_index/docstore/registry.py", line 36, in load_docstore_from_dict     type = docstore_dict[TYPE_KEY]

The only way I know of to fix it is to rebuild the index. Is there another way to address the issue?

The error happens when I call GPTSimpleVectorIndex.load_from_string(b, service_context=service_context)
1 comment
L
k
kokonutoil
·

Delete

Hey @Logan M, around a week ago I know you pushed a fix for the issue affecting deleting documents from an index that's stored as a json file. I just updated my llama-index to the latest version, but I think part of the issue still persists. When a document is deleted, all the data related to it is removed from the index_struct object, but nothing is removed from the docstore object. I think it has to do with this method
Plain Text
    def _delete(self, doc_id: str, **delete_kwargs: Any) -> None:
        """Delete a document."""
        self._index_struct.delete(doc_id)
        self._vector_store.delete(doc_id)

You might need to include self._docstore.delete_document(doc_id) here
4 comments
k
L