Find answers from the community

Updated 2 years ago

Doc id filters

At a glance

A community member reported a possible bug in the VectorIndexRetriever from the llamaindex library, where the node_ids and doc_ids filters do not seem to be working as expected. The issue persists across versions 0.6.7 and 0.7.1. The community member noted that the retriever always returns all nodes in the index, regardless of the provided filters.

In the comments, another community member asked if the issue was with the default vector index or a vector database integration. The original poster confirmed it was with the default vector index.

The community member then mentioned that they had already noted the 'filter' not being implemented for the SimpleVectorStore, and had raised the issue on the project's GitHub repository. Another community member thanked them for raising the issue and attempted to replicate the problem, but found that the filtering seemed to work for them.

The community member who was able to replicate the issue provided a code example demonstrating that the node_ids filter was working as expected, only returning the nodes with the specified IDs.

There is no explicitly marked answer in the comments.

Useful resources
Possible bug in VectorIndexRetriever?

I've tried both llamaindex 0.6.7 and 0.7.1 -- but in both the node_ids and doc_ids filters are not working at all for this query. No matter what, it always returns all nodes in the index, it doesn't filter them only to the node_ids or doc_ids provided in the parameter, which is how the documentation suggests to use it.

VectorIndexRetriever(index, similarity_top_k=10,doc_ids=list_of_doc_ids).retrieve("hello")
VectorIndexRetriever(index, similarity_top_k=10,node_ids=list_of_node_ids).retrieve("hello")
L
x
8 comments
Is this using the default vector index, or some vector db integration?
default vector index
I already noted that the 'filter' isn't implemented for simplevectorstore so decided to try using this instead but got to this error. raised the issue in github
Thanks for raising the issue. I have a moment right now, lets see if I can replicate (it's really weird this doesn't work, it's even used under the hood to avoid some errors πŸ˜… )
Hmmm, it seems to work for me
Plain Text
>>> from llama_index import VectorStoreIndex, SimpleDirectoryReader
>>> from llama_index.retrievers import VectorIndexRetriever
>>> documents = SimpleDirectoryReader("./data/paul_graham").load_data()
>>> index = VectorStoreIndex.from_documents(documents)
>>> node_ids = list(index.docstore.docs.keys())
>>> nodes = VectorIndexRetriever(index, similarity_top_k=10, node_ids=node_ids[:2]).retrieve("hello")
>>> nodes[0].node.node_id
'ec5966b8-7e20-44fd-8b23-a458174ad138'
>>> node_ids[:2]
['ec5966b8-7e20-44fd-8b23-a458174ad138', '62515e3a-5caa-4930-beb4-2adaf35a9e70']
>>> nodes[1].node.node_id
'62515e3a-5caa-4930-beb4-2adaf35a9e70'
>>> nodes[2].node.node_id
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> 
I only included 2 node ids, and it only returned 2 nodes, which both had the expected IDs
Add a reply
Sign up and join the conversation on Discord