Find answers from the community

Updated 2 weeks ago

Easiest Way to Fetch Document ID and Text from Node with Score

May be missing it somewhere, but is there a notebook that shows the easiest way to fetch the document id from a node_with_score and return the text from that whole document?
L
D
12 comments
Assuming you stored the original documents somewhere (like in a docstore)

Plain Text
node = retriever.retrieve("query")[0]
doc = docstore.get_document(node.ref_doc_id)


If you aren't using a docstore or didn't store the original document, you will have to use the relationships to pull all the nodes back together. This would probably be a recursive function but I'm too lazy to figure out the specifics lol

But something generally like this
Plain Text
all_nodes = []
cur_node = retriever.retrieve("query")[0]
all_nodes.append(cur_node)

if cur_node.prev_node:
   prev_node = vector_store.get_nodes(node_ids=[cur_node.prev_node.node_id])
if cur_node.next_node:
   next_node = vector_store.get_nodes(node_ids[cur_node.nexxt_node.node_id])
Yeah I was getting AttributeError: 'NodeWithScore' object has no attribute 'ref_doc_id'
with docstore.get_document(node.ref_doc_id).text

I guess you just can't fetch it from the NodeWithScore object. Would this also work?

docstore.get_node(node.node.relationships["source"]).get_content()
node.node.ref_doc_id should work, mb
the latter could work, but I think you need .node_id after getting the source
also, is there a way to quickly list the first x documents from the docstore?
negatory -- its a key-val setup, so there's no concept of a "list" or order
got it. I'm currently getting a docs={} for the docstore I fetch via index.docstore from an index that I successfully retrieve from - any idea how that could happen?
just a little context: I'm currently retrieving nodes from the index and fetching their doc_id, but then it is unretrievable from the docstore (even though I can search and find that exact document_id in my vector db), and it seems to be because the docs property is empty
The docstore is not typically populated automatically when using a vector db that can store nodes.

Most vector dbs serialize and store nodes directly
Got it, so you just use the document_id as metadata when fetching specific documents from the vector store as opposed to using docstore.get_document(node.ref_doc_id)? ie. just use node.node.ref_doc_id as a metadata filter for your vector store
just aiming for the most straightforward way to fetch full documents via nodes with a Qdrant vectorstore
yea using the ID as a filter in qdrant is another good idea, you can do something like vector_store.get_nodes(filters=MetadatFilters(filters=[MetadataFilter(key="ref_doc_id", value=ref_doc_id)]))
Add a reply
Sign up and join the conversation on Discord