Find answers from the community

s
F
Y
a
P
Updated last year

Doc id filters

Possible bug in VectorIndexRetriever?

I've tried both llamaindex 0.6.7 and 0.7.1 -- but in both the node_ids and doc_ids filters are not working at all for this query. No matter what, it always returns all nodes in the index, it doesn't filter them only to the node_ids or doc_ids provided in the parameter, which is how the documentation suggests to use it.

VectorIndexRetriever(index, similarity_top_k=10,doc_ids=list_of_doc_ids).retrieve("hello")
VectorIndexRetriever(index, similarity_top_k=10,node_ids=list_of_node_ids).retrieve("hello")
L
x
8 comments
Is this using the default vector index, or some vector db integration?
default vector index
I already noted that the 'filter' isn't implemented for simplevectorstore so decided to try using this instead but got to this error. raised the issue in github
Thanks for raising the issue. I have a moment right now, lets see if I can replicate (it's really weird this doesn't work, it's even used under the hood to avoid some errors πŸ˜… )
Hmmm, it seems to work for me
Plain Text
>>> from llama_index import VectorStoreIndex, SimpleDirectoryReader
>>> from llama_index.retrievers import VectorIndexRetriever
>>> documents = SimpleDirectoryReader("./data/paul_graham").load_data()
>>> index = VectorStoreIndex.from_documents(documents)
>>> node_ids = list(index.docstore.docs.keys())
>>> nodes = VectorIndexRetriever(index, similarity_top_k=10, node_ids=node_ids[:2]).retrieve("hello")
>>> nodes[0].node.node_id
'ec5966b8-7e20-44fd-8b23-a458174ad138'
>>> node_ids[:2]
['ec5966b8-7e20-44fd-8b23-a458174ad138', '62515e3a-5caa-4930-beb4-2adaf35a9e70']
>>> nodes[1].node.node_id
'62515e3a-5caa-4930-beb4-2adaf35a9e70'
>>> nodes[2].node.node_id
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> 
I only included 2 node ids, and it only returned 2 nodes, which both had the expected IDs
Add a reply
Sign up and join the conversation on Discord