Find answers from the community

Updated 4 months ago

Is there an example for upserting into

At a glance
Is there an example for upserting into Qdrant vector indices (i.e. do we still need to manually handle the ref_doc_ids)? Also are there plans to implement metadata filtering any time soon?
L
L
9 comments
you should just be able to use index.insert(document) right?

No idea on the timeline for metadata filters here. Would love a PR though if it's something you really need πŸ™

Spot that needs TLC:
https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/qdrant.py#L226
I wasn't sure what the final state of the ref_doc_id stuff was, I saw a few github issues about it, and wasn't sure if it still inserted just extra copies vs. replacement
Yeah I saw that note lol if nobody has any concrete plans to do it, I will probably give it a go
yea it looks like ref_doc_id is handled? at least from what I see in the source-code lol
https://github.com/jerryjliu/llama_index/blob/main/llama_index/vector_stores/qdrant.py#L90

Based on this, the doc ids are the resulting node ids right? So that means if I want to replace a specific entry that's already existing in qdrant, I need to know its id in qdrant beforehand and then replace the id in the node I'm inserting with that id to properly replace it?
yeaaa that's true 😦

For non-vector-store-integrations this process has been largely automated (i.e. index.ref_doc_info shows each ingested ref_doc_id and the id's of all it's nodes, as well as methods like update_ref_doc, insert_ref_doc, etc.).

But since each vector index stores the entire index in the vector db, automating that stuff means implementing it on a per-vector-store basis, and I haven't gotten to it yet lol
Hmm, might be worth to implement an explicit update method? I don't know about incorporating it into the existing insert method since it seems like a better idea to leave some flexibility and allow people to set the node ids manually as well, but for people who expect updating with replacement, you could do it by using the metadata filtering, which I'd like to implement either way.

On the other hand, with metadata filtering implemented, you could easily retrieve the ids and manually set them for the nodes.
Yea, there's still an insert_nodes function that would technically overwrite existing nodes if the doc_id of each node is consistant (I think?). But yea, getting those id's is the pain here
Ok, I'll probably just look into the metadata filtering bit first then, seems like that would solve a bunch of problems lol
Add a reply
Sign up and join the conversation on Discord