Find answers from the community

Updated 2 months ago

What Should I Do When a PDF with Selectable Text Was Not Extracted Correctly?

At a glance

The post asks for help when a PDF with selectable text was not extracted correctly. The first comment suggests using the LlamaParse library as a potential solution. The second comment notes that each page of a PDF is treated as a separate document by default, and suggests creating a custom PDF reader if the issue is that the text spans across page boundaries. There is no explicitly marked answer in the comments.

Useful resources

ccecin

Hello
What should I do when a PDF with selectable text was not extracted correctly?
Thank you

2 comments

WWhiteFang_Jr

Hi, What parser are you using?
If you have not used LlamaParse , I would highly suggest that you give it a try!
https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse/

RRestodecoca

What is the problem? Each page of a pdf by default is treated as an separeted doc, you need to make a custom pdf reader if your problem is: chunk starts at a bottom of a page and ends in the top of another page then this solution could work

Add a reply