Find answers from the community

Updated last year

how do I remove nodes from index by document name?

At a glance
@kapa.ai how do I remove nodes from index by document name?
k
R
20 comments
Please stand by @Raz as I search the knowledge base. We are experiencing especially high traffic right now and might take longer than usual. :wave:
----------------------
To remove nodes from an index by document name in LlamaIndex, you can use the delete_ref_doc method. This method deletes a document and its nodes using the reference document ID (ref_doc_id). Here is an example of how to use it:

Plain Text
index.delete_ref_doc("doc_id_0", delete_from_docstore=True)


In this example, "doc_id_0" is the ID of the document you want to delete. The delete_from_docstore parameter is optional and defaults to False. If you set it to True, it will also delete the document from the document store.

Please note that the tree index currently does not support deletion.

Sources:
  • Document Management ()
  • Structured Store Index ()
  • Base Index Class ()
  • Table Index ()
  • Vector Store Index ()
  • List Index ()
  • Tree Index ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
how do i get the ref_doc_id based on the filename?
@kapa.ai how do i get the ref_doc_id based on the filename?
Let me check the knowledge base for you. One moment! :books:
----------------------
To get the ref_doc_id based on the filename, you can use the SimpleDirectoryReader from LlamaIndex. When using the SimpleDirectoryReader, you can automatically set the document id_ to be the full path to each document, which in this case would be the filename. Here is an example of how to use it:

Plain Text
from llama_index import SimpleDirectoryReader

documents = SimpleDirectoryReader("./data", filename_as_id=True).load_data()
print([x.doc_id for x in documents])


In this example, ./data is the directory where your documents are stored. The filename_as_id=True argument sets the document id_ to be the filename. The load_data() function loads the data from the directory. The print statement prints out the doc_id for each document in the documents list, which in this case would be the filenames.

Please note that the doc_id is the same as the ref_doc_id in this context.

Source: Customizing Documents ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai how do i list out all the document in the index
Let me check the knowledge base for you. One moment! :books:
----------------------
To list out all the documents in the index, you can use the get_all_documents method from the index object. Here is an example of how to use it:

Plain Text
documents = index.get_all_documents()


In this example, index is your index object. The get_all_documents() function returns a list of all documents in the index. You can then iterate over this list to access each document.

Please note that the actual method to retrieve all documents may vary depending on the specific index type you are using.

Source: Base Index Class ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
@kapa.ai is doc_id and ref_doc_id the same thing?
Let me check the knowledge base for you. One moment! :books:
----------------------
Yes, in the context of LlamaIndex, doc_id and ref_doc_id are used interchangeably to refer to the unique identifier of a document in the index or document store.

For example, when you want to delete a document and its associated nodes from the index, you can use either doc_id or ref_doc_id as the parameter for the delete_ref_doc method.

Sources:
  • Customizing Documents ()
  • Base Index Class ()
  • Document Store ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Whats the diffrence between docstore.delete_ref_doc() vs docstore.delete_document()
@kapa.ai Whats the diffrence between docstore.delete_ref_doc() vs docstore.delete_document()
Let me check the knowledge base for you. One moment! :books:
----------------------
The delete_ref_doc and delete_document methods in LlamaIndex's Document Store (docstore) both serve to delete documents, but they operate slightly differently.

  • delete_ref_doc(ref_doc_id: str, raise_error: bool = True): This method deletes a reference document (ref_doc) and all its associated nodes from the document store. The ref_doc_id is the unique identifier of the reference document you want to delete.
  • delete_document(doc_id: str, raise_error: bool = True): This method deletes a document from the document store. The doc_id is the unique identifier of the document you want to delete.
In summary, delete_ref_doc not only deletes the document but also all its associated nodes, while delete_document only deletes the document.

Sources:
  • Document Store ()
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
how do i get the ref_doc_id from the docstore
@kapa.ai how do i get the ref_doc_id from the docstore
Let me check the knowledge base for you. One moment! :books:
@Raz That's a great question, but unfortunately I am only capable of responding to a maximum of three questions per thread. Don't hesitate to create a new thread for any further questions you have though. I'm here to help!
Add a reply
Sign up and join the conversation on Discord