What Should I Do When a PDF with Selectable Text Was Not Extracted Correctly?
What Should I Do When a PDF with Selectable Text Was Not Extracted Correctly?
At a glance
The post asks for help when a PDF with selectable text was not extracted correctly. The first comment suggests using the LlamaParse library as a potential solution. The second comment notes that each page of a PDF is treated as a separate document by default, and suggests creating a custom PDF reader if the issue is that the text spans across page boundaries. There is no explicitly marked answer in the comments.
What is the problem? Each page of a pdf by default is treated as an separeted doc, you need to make a custom pdf reader if your problem is: chunk starts at a bottom of a page and ends in the top of another page then this solution could work