Find answers from the community

Updated 6 months ago

Does the PDF reader use OCR

At a glance

The post asks if the PDF reader uses OCR. The comments indicate that the PDF reader currently uses a basic pdf2text parser, not OCR. Community members suggest using the donut model, which seems to be effective for processing images, but it is not clear if it can be used for parsing PDFs. Some community members express interest in having a donut-based PDF parser as a simpler alternative to the current solution.

Useful resources
Does the PDF reader use OCR?
j
S
r
7 comments
not yet, it uses a basic pdf2text parser
@conceptofmind in LangChain discord likes the following: https://github.com/clovaai/donut
we use the donut model to parse images!
so i amend my previous statement. the image parser uses the donut model, the pdf parser does not
Sorry for the late response, but is there a reason you don't use the donut model to parse PDFs? And is there a simpler alternative for PDFs?
we have a donut model to process images, but if you want to add a donut pdf parser that would be appreciated too!
Add a reply
Sign up and join the conversation on Discord