This part?
from llama_index.core.node_parser import MarkdownElementNodeParser
node_parser = MarkdownElementNodeParser(
llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8
)
nodes = node_parser.get_nodes_from_documents(documents)
You could keep save a map of doucment -> nodes to disk as a cache
document_to_nodes = {}
for node in nodes:
if node.ref_doc_id not in document_to_nodes:
document_to_nodes[node.ref_doc_id] = []
document_to_nodes[node.ref_doc_id].append(node.model_dump())
And then just pickle that to disk to cache