in
https://github.com/jerryjliu/llama_index/blob/main/llama_index/node_parser/simple.pywe have a call to
get_nodes_from_document
for each document, which will request the text splitter to create nodes taking into account the metadata
so far (at this stage the metadata is either a custom file_metadata or is created by the text splitter/reader -- for instance pdf reader will create a
page
metadata):
nodes = get_nodes_from_document(
document,
self._text_splitter,
self._include_metadata,
... )
then once all nodes have been generated meta data extractor enters:
self._metadata_extractor.process_nodes(all_nodes)
but maybe I'm wrong ? π₯΄