Find answers from the community

Updated 3 months ago

what has been your best document parser

what has been your best document parser, especially for PDF with tables? So far it looks like only llmsherpa and llamaparse (which I can't use) did a good job with tables
E
R
v
9 comments
can you tell us why you can't use llamaparse?
privacy issues, documents can't leave the network (these are actual legal proceedings that might still be active)
we would be super happy to have an on-prem option, given that it got me the best results, but for now it's a no go
Unstructured, on the other hand, is not really very good
through the platform contact form you can contact us to the on-prem option
also the best alternative to llamaparse is likely pyMuPDF
(slight tangent)
Is pyMuPDF better than camelot at tables? Or if we are dealing with tables specifically, do you think camelot is the way to go?
Does LlamaParse have the features in pyMuPDF to fix corrupted PDFs? That's intreguing and helpful. Would you use them in concert?
I think I will give PyMuPDF a quick go
Add a reply
Sign up and join the conversation on Discord