Find answers from the community

Updated 9 months ago

Hello, I have a question: I have used

Hello, I have a question: I have used SimpleDirectoryReader with load_data() to store a file, to which I have added some metadata.
I now want to find a way to delete from the VectorStoreIndex, the docs that refers to specific values of the metadata I have.
I did not find a way to get the node_id / ref_doc_id for each "file" I have uploaded, and then apply the index.delete_ref_doc (https://docs.llamaindex.ai/en/latest/module_guides/indexing/document_management/#deletion)
Is it any way to do it without storing the ref_doc_id for each uploaded file, in a DB from where to retrieve it once I need to delete?

I might have found this https://github.com/run-llama/llama_index/discussions/8930 πŸ˜„
L
a
2 comments
You need the ref_doc_id

Usually you can set the document ids when inserting to something that makes sense (filenames, etc.)

Or, you can query the vector db, and check the returned nodes

Plain Text
retriever = index.as_retriever()
nodes = retriever.retrieve("test")
print(nodes[0].node.ref_doc_id)
thank you @Logan M
I needed to get all the nodes based on some metadata, without doing a retrieve
I managed it storing the ref_doc_ids on a mongoDB (based on my metadata) and then retrieving them to then delete them by delete_ref_doc πŸ™‚
It kinda worked even if it's not the best πŸ˜„
Add a reply
Sign up and join the conversation on Discord