Find answers from the community

Updated 4 months ago

How can I debug / influence the

At a glance

How can I debug / influence the extraction of metadata from PDFs? My Problem: When I break one of my PDFs into Nodes (Using the SimpleDirectoryReader and the SentenceSplitter, the Node metadata 'page_label' (document page number) is empty. This happens just with one pdf, but works fine for others. Any advice? Thanks!

2 comments

WWhiteFang_Jr

If it happens with just one PDF, could be issue in the pdf itself?

You can manually set the page_label and other metadata for this PDF

aalx

This is exactly what I did. In the returned document list, every entry represents one page. Worked like a charme, thank you!

Add a reply