Hi! Is there any way to convert a

At a glance

The community members are discussing how to convert a Document object to a Node object without using a splitter, as the original poster already has their documents split. The suggested solutions include using the SimpleNodeParser or SentenceSplitter from the llama_index library, but the community members note that these may not be the ideal solutions as they involve additional boilerplate or workarounds. One community member suggests that a Document object and a Node object are essentially the same, and provides a simple conversion using the TextNode class from the llama_index schema. However, there is no explicitly marked answer in the comments.

JJavier Sanchez

Hi! Is there any way to convert a Document Object to a Node object without using any kind of splitter? I already have my documents splitted and I just want to convert them into nodes

14 comments

WWhiteFang_Jr

Node objects are of same size. So Your documents should be of less size than that. As final node is formed by adding Metadata + Text.

If your Document chunk size is lower than default Node size then you can simply do this to create node.

Plain Text

from llama_index.node_parser import SimpleNodeParser
node_parser = SimpleNodeParser.from_defaults()
nodes = node_parser.get_nodes_from_documents(documents)

Keep this in mind the above code is for version < 0.9

For v0.9 and above
You can create nodes likes this

Plain Text

from llama_index.text_splitter import SentenceSplitter
from llama_index.extractors import TitleExtractor 

node_parser = SentenceSplitter(chunk_size=512)
extractor = TitleExtractor()

# use transforms directly
nodes = node_parser(documents)

JJavier Sanchez

🤔 🤔

JJavier Sanchez

The solution of SentenceSplitter(chunk_size=512) could work, since I can set a bigger chunk_size

JJavier Sanchez

But not really the solution I am looking for

JJavier Sanchez

Why should I use a splitter to create the nodes? I have already done the work of splitting the text to create document objects and I do not need to set the size for either the nodes or the document objects.

JJavier Sanchez

Also 'SentenceSplitter is not callable' error using it that way

WWhiteFang_Jr

So the splitter has a default value for the chunks that is 1024.

If your documents are smaller than that then your document will be directly converted into nodes with some addition like metadata

WWhiteFang_Jr

There will be no split if doc size < default size

JJavier Sanchez

Th problem is that my documents are bigger

WWhiteFang_Jr

You can increase the chunk size then

WWhiteFang_Jr

You need to update llamaindex. Parsing process has been flatlined

JJavier Sanchez

It is a solution but I think that it is a workaround and adds boilerplate. There should be a way to create nodes regardless of the size of the documents/nodes

JJavier Sanchez

But anyway, thanks for the response, I can work with that

LLogan M

@Javier Sanchez (and @WhiteFang_Jr ) A document and a node are basically the same object (a Document object actually extends the TextNode class lol)

Usually you can pass either into functions and everything will work fine without converting. But if not, quick conversion might be

Plain Text

from llama_index.schema import TextNode
nodes = [TextNode(text=document.text, metadata=document.metadata) for document in documents]

Add a reply

Find answers from the community

Hi! Is there any way to convert a