Find answers from the community

Updated 3 months ago

Hi! Is there any way to convert a

Hi! Is there any way to convert a Document Object to a Node object without using any kind of splitter? I already have my documents splitted and I just want to convert them into nodes
W
J
L
14 comments
Node objects are of same size. So Your documents should be of less size than that. As final node is formed by adding Metadata + Text.

If your Document chunk size is lower than default Node size then you can simply do this to create node.

Plain Text
from llama_index.node_parser import SimpleNodeParser
node_parser = SimpleNodeParser.from_defaults()
nodes = node_parser.get_nodes_from_documents(documents)



Keep this in mind the above code is for version < 0.9

For v0.9 and above
You can create nodes likes this

Plain Text
from llama_index.text_splitter import SentenceSplitter
from llama_index.extractors import TitleExtractor 

node_parser = SentenceSplitter(chunk_size=512)
extractor = TitleExtractor()

# use transforms directly
nodes = node_parser(documents)
πŸ€” πŸ€”
The solution of SentenceSplitter(chunk_size=512) could work, since I can set a bigger chunk_size
But not really the solution I am looking for
Why should I use a splitter to create the nodes? I have already done the work of splitting the text to create document objects and I do not need to set the size for either the nodes or the document objects.
Also 'SentenceSplitter is not callable' error using it that way
So the splitter has a default value for the chunks that is 1024.

If your documents are smaller than that then your document will be directly converted into nodes with some addition like metadata
There will be no split if doc size < default size
Th problem is that my documents are bigger
You can increase the chunk size then
You need to update llamaindex. Parsing process has been flatlined
It is a solution but I think that it is a workaround and adds boilerplate. There should be a way to create nodes regardless of the size of the documents/nodes
But anyway, thanks for the response, I can work with that
@Javier Sanchez (and @WhiteFang_Jr ) A document and a node are basically the same object (a Document object actually extends the TextNode class lol)

Usually you can pass either into functions and everything will work fine without converting. But if not, quick conversion might be

Plain Text
from llama_index.schema import TextNode
nodes = [TextNode(text=document.text, metadata=document.metadata) for document in documents]
Add a reply
Sign up and join the conversation on Discord