Find answers from the community

Updated 2 months ago

How can I debug / influence the

How can I debug / influence the extraction of metadata from PDFs? My Problem: When I break one of my PDFs into Nodes (Using the SimpleDirectoryReader and the SentenceSplitter, the Node metadata 'page_label' (document page number) is empty. This happens just with one pdf, but works fine for others. Any advice? Thanks!
W
a
2 comments
If it happens with just one PDF, could be issue in the pdf itself?

You can manually set the page_label and other metadata for this PDF
This is exactly what I did. In the returned document list, every entry represents one page. Worked like a charme, thank you!
Add a reply
Sign up and join the conversation on Discord