Find answers from the community

Updated 2 months ago

What Should I Do When a PDF with Selectable Text Was Not Extracted Correctly?

At a glance

The post asks for help when a PDF with selectable text was not extracted correctly. The first comment suggests using the LlamaParse library as a potential solution. The second comment notes that each page of a PDF is treated as a separate document by default, and suggests creating a custom PDF reader if the issue is that the text spans across page boundaries. There is no explicitly marked answer in the comments.

Useful resources
Hello
What should I do when a PDF with selectable text was not extracted correctly?
Thank you
W
R
2 comments
Hi, What parser are you using?
If you have not used LlamaParse , I would highly suggest that you give it a try!
https://docs.llamaindex.ai/en/stable/llama_cloud/llama_parse/
What is the problem? Each page of a pdf by default is treated as an separeted doc, you need to make a custom pdf reader if your problem is: chunk starts at a bottom of a page and ends in the top of another page then this solution could work
Add a reply
Sign up and join the conversation on Discord