Best Options for Parsing Payroll PDFs

At a glance

The community member is looking to parse a payroll PDF and has tried using llamaparse, but it did not give good results. In the comments, one community member suggests that llamaparse should work fine, especially in premium mode. Another community member provides more suggestions, stating that for 'nice' PDFs, the Python OCR library pytesseract works okay, but for not-nice scans it can make mistakes. They also mention Google Cloud's DocumentAI as having good off-the-shelf document parsers, and the option to hand-label documents using the 'custom extractor', which can be pretty much perfect if the labeling is good. However, they note that navigating the GCP interface is annoying.

BBalagona

Hello guys,
I want to parse a payroll pdf, what will be best for me ? I have tried using llama parse but its not giving good results

2 comments

LLogan M

llamaparse should be fine tbh, especially if you use premium mode

zzgongax

For 'nice' pdfs the python ocr library pytesseract works ok. For not-nice scans it can make mistakes. Google Cloud's DocumentAI has good off-the-shelf document parsers. There's also an option to hand-label documents (the 'custom extractor') and if your labeling is good, its pretty much perfect. However, its annoying to navigate the GCP interface, unfortunately

Add a reply

Find answers from the community

Best Options for Parsing Payroll PDFs