Find answers from the community

Updated 2 years ago

I ve been trying out updating and

I've been trying out updating and deleting documents from the index but seems like that is an issue?
Plain Text
doc = Document(text=txt)
simple_index.insert(doc)

but this uses the nodeparser to break the doc down. Now if I want to do update/delete, the original doc, it seems I can't because the docstore has only nodes but I have only document objects.

Plain Text
# update
doc.text = "how to make wealth = build a startup"
simple_index.update(doc) # fails silently because delete fails (this works by deleting and inserting)

# delete
index.delete(doc.doc_id) # fails silently 
len(simple_index.docstore.docs)
L
j
j
11 comments
I thiiiink this is fixed in this PR https://github.com/jerryjliu/llama_index/pull/1195

But for whatever reason that PR has been sitting there for a while. Maybe someone should swoop in and steal the bug fix glory lol
I checked that out but I don't know if that is the correct fix. So basically I think this is a design issue.
Imagine you create an index with documents. This is easier for a user to think about as oppose to nodes but the docstore inside the index will only store the nodes and not the documents right? So if I have to update a document I will have to figure out the exact nodes that document created and update those too.

This is my understanding of how index and docstore works together. is that correct?
Hmmm, yea I think you are right. Double-checked the code it at all operates at the node level πŸ€”

So for your update/delete example to work, you also need to operate at the node level
This relationship could be managed better lol
The delete and update functions could also check ref_doc_id of each node to make this work?
yep something like that should help. As I user I just have to be worried about documents, it much easier for me to build it that way. the framework should handle the mapping of documents to nodes and perform operations.
Maybe the solution is someway to map documents to nodes in the doc store. I can do a bit of digging and maybe raise an issue and if someone is interested they can put the PR or else I can take it up maybe next week
@jerryjliu0 what do you think too?
cc @disiok yeah this is a good point, we discussed during the call earlier today. We think Node objects are fundamentally different than Document objects (since Document represents the source, unbroken doc, and Nodes represent the lower-level text chunk). That said would be interesting to think of a UX where you can update the docstore from a new Document rather than a new Node
I can actually start an issue with more details if you guys want and we can brainstorm more there (or here, which ever is easier). Won't mind trying to implement this myself, should be fun πŸ˜„
@jjmachan if you're interested, sure! if it's easier you could create a "WIP" scoping PR that outlines your thoughts first
yep I'll do that then, would be easier
Add a reply
Sign up and join the conversation on Discord