Easiest Way to Fetch Document ID and Text from Node wit...

At a glance

May be missing it somewhere, but is there a notebook that shows the easiest way to fetch the document id from a node_with_score and return the text from that whole document?

12 comments

LLogan M

Assuming you stored the original documents somewhere (like in a docstore)

Plain Text

node = retriever.retrieve("query")[0]
doc = docstore.get_document(node.ref_doc_id)

If you aren't using a docstore or didn't store the original document, you will have to use the relationships to pull all the nodes back together. This would probably be a recursive function but I'm too lazy to figure out the specifics lol

But something generally like this

Plain Text

all_nodes = []
cur_node = retriever.retrieve("query")[0]
all_nodes.append(cur_node)

if cur_node.prev_node:
   prev_node = vector_store.get_nodes(node_ids=[cur_node.prev_node.node_id])
if cur_node.next_node:
   next_node = vector_store.get_nodes(node_ids[cur_node.nexxt_node.node_id])

DDS

Yeah I was getting AttributeError: 'NodeWithScore' object has no attribute 'ref_doc_id'
with docstore.get_document(node.ref_doc_id).text

I guess you just can't fetch it from the NodeWithScore object. Would this also work?

docstore.get_node(node.node.relationships["source"]).get_content()

LLogan M

node.node.ref_doc_id should work, mb

LLogan M

the latter could work, but I think you need .node_id after getting the source

DDS

also, is there a way to quickly list the first x documents from the docstore?

LLogan M

negatory -- its a key-val setup, so there's no concept of a "list" or order

DDS

got it. I'm currently getting a docs={} for the docstore I fetch via index.docstore from an index that I successfully retrieve from - any idea how that could happen?

DDS

just a little context: I'm currently retrieving nodes from the index and fetching their doc_id, but then it is unretrievable from the docstore (even though I can search and find that exact document_id in my vector db), and it seems to be because the docs property is empty

LLogan M

The docstore is not typically populated automatically when using a vector db that can store nodes.

Most vector dbs serialize and store nodes directly

DDS

Got it, so you just use the document_id as metadata when fetching specific documents from the vector store as opposed to using docstore.get_document(node.ref_doc_id)? ie. just use node.node.ref_doc_id as a metadata filter for your vector store

DDS

just aiming for the most straightforward way to fetch full documents via nodes with a Qdrant vectorstore

LLogan M

yea using the ID as a filter in qdrant is another good idea, you can do something like vector_store.get_nodes(filters=MetadatFilters(filters=[MetadataFilter(key="ref_doc_id", value=ref_doc_id)]))

Add a reply

Find answers from the community

Easiest Way to Fetch Document ID and Text from Node with Score