Find answers from the community

Updated 2 months ago

Constructing a Document from a List of TextNode

hello there, is there a way to construct a Document from a list of TextNode?

I have a markdown document, from LlamaParse, where I break them down into a list of nodes using MarkdownNodeParser, utilising node.metadata['Header_1] as a way of filtering those nodes by the md headers from my document, and do text amendment.

Now that I have updated llama-index-core, node.metadata dictionary is missing the Header_1. What I do now is manually add them back, but I'm stuck with a list of updated TextNode, not knowing how to convert them into a Document.
W
S
g
9 comments
Hey, you can add metadata directly to Node itself, But not quite sure on what you mean with this: Now that I have updated llama-index-core, node.metadata dictionary is missing the Header_1

But to give you an idea how you can add metadata to a node:
Plain Text
from llama_index.core.schema import TextNode
node1 = TextNode(text="<text_chunk>", id_="<node_id>")
node1.metadata['Header_1'] = 'ADD_HEARER'
The new code adds the metadata header_path you can just try accessing this instead i suppose. If there is no header in a section then it is just / else it is an actual path like for example /1. Introduction/1.1 Subsection.
Yeah that is what I do. My question was, if i have like multiple TextNode, is there a way to combine them into a Document object?

Before I updated the llama-index-core module, this wasn't an issue.
With a Document object, you can do
Plain Text
from llama_index.core.node_parser import MarkdownNodeParser
parser = MarkdownNodeParser()
nodes = parser.get_nodes_from_documents([Document])


And nodes will be a list of TextNode. Is there a way to combine them back into a Document object? Like an inverse transform operation.
Hope my questions is clear, thanks!
I dont think there is a method for inverse transform but You can iterate over the nodes, stitch the text together, copy the metadata and create a final Document.

Plain Text
text = ''
metadata = []
for node in nodes:
  text = text + node.text
  metadata = metadata.append(node.metadata)

# now form the document object using the text and metdata
doc = Document(text=text, metadata=metadata)
ah this is something that I am looking for. didn't know metadata for Document can be added in similar ways as TextNode.
I will give it a try, thanks @WhiteFang_Jr !
Add a reply
Sign up and join the conversation on Discord