Find answers from the community

Updated 9 months ago

delete nodes from vector

At a glance

The community member is trying to delete nodes from a vector index based on the file from which those nodes were created. They found a method called delete_ref_doc in the documentation but don't know how to use it. The comments suggest that the community member should instantiate the vector store and call the delete method with the node ID. However, the community member doesn't have the node IDs, only the PDF file names they want to remove.

The community member is using the SimpleDirectoryReader and VectorStoreIndex.from_documents methods to create the vector index, and they are wondering how to store the document IDs during this process. They also found a way to set the filename as the ID, but this resulted in multiple IDs for a single PDF file.

The comments suggest that the community member should use a low-level API instead of the high-level API, and that the delete_ref_doc method will delete all nodes associated with a parent document ID. However, the community member doesn't know the document ID and only has the PDF file name. The community members discuss potential solutions, but there is no explicitly marked answer.

I am trying to delete nodes from my vector index based on the file from which those nodes are made like i want to delete a specific document nodes. I found this in documentation
Plain Text
delete_ref_doc(ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) -> None
but dont know how to use it or is it even correct way.
A
S
L
38 comments
instantiate the vector store, call the method deleteor adeleteon it with the node ID
I have similar code..
but i dont have a node id all I have is pdf name that I want to remove from vector store.

One way i can think of is iterate through whole nodes and check if file_name in matadata of that node matches with my pdf name that I want to delete if yes then delete that node
How are you adding the documents into the vector store?
The ingestion process
When you add documents, the document ids related to that document are returned
In my case I store these ids to delete them
Plain Text
documents = SimpleDirectoryReader(
    input_dir="/content/handbook-bge-embeddings/docs"
).load_data()

vector_index = VectorStoreIndex.from_documents(documents)

I am using this code to create vector index from pdf files which are in dir docs. then persisting the vector index,

How can I store doc id in this process?
Also after creating vector index if i want to add new doc to this vector index i am using this -
Plain Text
nodes = parser.get_nodes_from_documents(documents2)
vector_index.insert_nodes(nodes)
ok i also found this setting filename as id while creating vector index -
Plain Text
documents = SimpleDirectoryReader("./data", filename_as_id=True).load_data()
but now for 1 pdf file doc ids are like file_name_part1.... I thought for 1 pdf file there will be only 1 doc id but thats not the case.
You're using their high level API
I don't know what from_documents return, tbh
What I do is I separate ingestion phase from QA phase
Using a low level API
I call the vector_stor.add(nodes) and this method returns the IDs of the nodes inserted for later removal
I don't see the way they do as production code... idk
not going to read this whole thread lol but detele_ref_doc deletes by input document ID

Documents are broken into many nodes, and it will delete all nodes associated with a parent id
Didn't know this
Yes, several nodes
But if I place one single node (random) that is related to a document will it work?
A random node ID
Will it delete the other nodes?
Currently I store all the node ids in a separate database and delete one by one
Assuming node.ref_doc_id points to the parent document, it will work fine
well I have doc name(name of the pdf file) I want to remove all nodes associated with that doc from vector index how can I do that?
I dont know its doc id
Harder to do. Depends on how you inserted the document in the vector store. What if you have 2 documents with the same name?
that will not be the case and if it is then delete both
What is your vector index?
Add a reply
Sign up and join the conversation on Discord