SimpleDirectoryReader
I lose the information of which page number the text is coming from (it's obviously not part of the meta data anymore as the file does not have pages like a pdf anymore), if I'm lucky the page number is in the source node but that's not reliable enough.Document
or TextNode
objects, theres a ton of useful metadata in there (including page numbers, filenames, etc.)parser = LlamaParse(...) json_results = parser.get_json_result(["file1.pdf", ...])
doc = Document(text=text, metadata={...})