Find answers from the community

Updated 2 months ago

Hello, I am trying to do a

Hello, I am trying to do a CustomRetriever, but I would like to exclude documents that have a certain metadata, (what I want to exclude is dynamic based on the user query). I only found a way to do the contrary of what I would like using Metadata filters, but it seems like it only allows me to get only documents that follow the exact rule and not to exclude them, any ideas of how to do it i a clean way? besides looping through the vecor nodes

Thank you for your time
t
L
13 comments
Besides this, as removing the document from the nodes is fairly easy to do without any filters (even though i would like to) I was wondering, how can I get all documents?
def _retrieve(self, query_bundle: QueryBundle) -> list[NodeWithScore]:
"""Retrieve nodes given query."""

print(query_bundle.query_str)

jira_key = extract_jira_issue_id(query_bundle.query_str)

# Get document that has this jira key in metadata -> Allows to insert the description in the query

vector_nodes = self._vector_retriever.retrieve(query_bundle)

# I would easily be able to remove the doc from the nodes here
return vector_nodes
Looking for something that would be: get_all_nodes in the VectorIndexRetriver
You can only retrieve all nodes if you are using the default vector index, or you enabled the docstore override

index.docstore.docs will return a dict of id -> node for all data
but it seems like it only allows me to get only documents that follow the exact rule and not to exclude them

Yea, been meaning to update the default vector store to support this, haven't done it yet
So this should allow me to have all docs right?

class CustomRetriever(BaseRetriever):
"""Custom retriever that performs both semantic search and hybrid search."""

def init(
self,
vector_retriever: VectorIndexRetriever,
) -> None:
"""Init params."""

self._vector_retriever = vector_retriever
super().init()

def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
"""Retrieve nodes given query."""

all_docs = self._vector_retriever._index.docstore.docs

vector_nodes = self._vector_retriever.retrieve(query_bundle)

return vector_nodes
Not quite -- you didn't use all_docs anywhere

Maybe something like

Plain Text
     def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve nodes given query."""

        all_data = list(self._vector_retriever._index.docstore.docs.values())
        all_nodes = [NodeWithScore(node=x, score=1.0) for x in all_data]
        return all_nodes
Yes i didn't apply my logic, just wanted to make sure I could get all the docs -> I need it to edit my query based on some rules (should just be normal programming)
Thank you a lot !
@Logan M Hey, any idea of how all_data could be empty, but then when I do: self._vector_retriever.retrieve(query_bundle)
I get the 10 sources?
all_data could be empty if you ate using a vector db integration
ow, so I could just query the db
or even just do a retriver with the metadata filter i need and get it from the nodes
Add a reply
Sign up and join the conversation on Discord