LlamaIndex

Log inLog into community

Find answers from the community

Updated 9 months ago

delete nodes from vector

delete nodes from vector

At a glance

The community member is trying to delete nodes from a vector index based on the file from which those nodes were created. They found a method called delete_ref_doc in the documentation but don't know how to use it. The comments suggest that the community member should instantiate the vector store and call the delete method with the node ID. However, the community member doesn't have the node IDs, only the PDF file names they want to remove.

The community member is using the SimpleDirectoryReader and VectorStoreIndex.from_documents methods to create the vector index, and they are wondering how to store the document IDs during this process. They also found a way to set the filename as the ID, but this resulted in multiple IDs for a single PDF file.

The comments suggest that the community member should use a low-level API instead of the high-level API, and that the delete_ref_doc method will delete all nodes associated with a parent document ID. However, the community member doesn't know the document ID and only has the PDF file name. The community members discuss potential solutions, but there is no explicitly marked answer.

·

I am trying to delete nodes from my vector index based on the file from which those nodes are made like i want to delete a specific document nodes. I found this in documentation

Plain Text

delete_ref_doc(ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) -> None

but dont know how to use it or is it even correct way.

A

S

L

38 comments

instantiate the vector store, call the method deleteor adeleteon it with the node ID

I have similar code..

Attachment

but i dont have a node id all I have is pdf name that I want to remove from vector store.

One way i can think of is iterate through whole nodes and check if file_name in matadata of that node matches with my pdf name that I want to delete if yes then delete that node

How are you adding the documents into the vector store?

The ingestion process

When you add documents, the document ids related to that document are returned

In my case I store these ids to delete them

Plain Text

documents = SimpleDirectoryReader(
    input_dir="/content/handbook-bge-embeddings/docs"
).load_data()

vector_index = VectorStoreIndex.from_documents(documents)

I am using this code to create vector index from pdf files which are in dir docs. then persisting the vector index,

How can I store doc id in this process?

Also after creating vector index if i want to add new doc to this vector index i am using this -

Plain Text

nodes = parser.get_nodes_from_documents(documents2)
vector_index.insert_nodes(nodes)

ok i also found this setting filename as id while creating vector index -

Plain Text

documents = SimpleDirectoryReader("./data", filename_as_id=True).load_data()

but now for 1 pdf file doc ids are like file_name_part1.... I thought for 1 pdf file there will be only 1 doc id but thats not the case.

Hm...

You're using their high level API

I don't know what from_documents return, tbh

What I do is I separate ingestion phase from QA phase

Using a low level API

Like this:

Attachment

I call the vector_stor.add(nodes) and this method returns the IDs of the nodes inserted for later removal

I don't see the way they do as production code... idk

not going to read this whole thread lol but detele_ref_doc deletes by input document ID

Documents are broken into many nodes, and it will delete all nodes associated with a parent id

Didn't know this

Yes, several nodes

But if I place one single node (random) that is related to a document will it work?

A random node ID

Will it delete the other nodes?

Currently I store all the node ids in a separate database and delete one by one

Assuming node.ref_doc_id points to the parent document, it will work fine

🤔 ok

well I have doc name(name of the pdf file) I want to remove all nodes associated with that doc from vector index how can I do that?

I dont know its doc id

Hmmm

Harder to do. Depends on how you inserted the document in the vector store. What if you have 2 documents with the same name?

that will not be the case and if it is then delete both

What is your vector index?

Vector store

In memory?

Add a reply

Sign up and join the conversation on Discord