node.ref_doc_id
ref_doc_id
, but for a large .pdf for example it still outputs multiple parts with of the same pdf with different ref_doc_id
. I expected this to stay consistent for the whole pdffrom llama_index.readers import PDFReader pdf_reader = PDFReader(return_full_document=True) documents = pdf_reader.load_data(Path('huge2.pdf'))
full_pdf_reader = PDFReader(return_full_document=True) documents = SimpleDirectoryReader('./', input_files=['huge2.pdf'], file_extractor={ '.pdf': full_pdf_reader } ).load_data()