Find answers from the community

Updated 5 months ago

Pdf parsing issue with multi-page tables

At a glance

The community member is having an issue parsing a PDF file, where the parser does not work for tables that start on one page and end on another. The same parsing instructions and file used to work 2 weeks ago. The community members discuss potential workarounds, with one suggesting that getting an LLM (Large Language Model) to parse the outputs of every page and combine the tables that are continuations of each other could be an option. However, there is no explicitly marked answer provided.

vversa

Hello again.
I'm having an issue parsing a PDF. This same parsing instructions and file used to work 2 weeks ago.

Job id is 4d3d72d7-ab11-4593-83e5-89672a1a523f

The parser seems to not work for tables that start in one page and ends in another page, i'll leave a screenshot.
This screenshot does not contain any sensitive information and its not private btw

Any ideas on a workaround?

Attachment

3 comments

LLogan M

multi-page spanning tables is a feature that is being worked on, but yes, not quite supported yt

vversa

sure, no problem. Would you have any suggestions on a workaround? it would help me a lot

LLogan M

I don't think there is an easy work around. One option is getting an LLM to parse the outputs of every page from llama-parse and combine tables that are continuations of eachother (passing in pairs at a time I suppose?)

Add a reply