Find answers from the community

Updated last year

Hi Team I was using

At a glance
Hi Team, I was using UnstructuredElementNodeParser to split document into text nodes and index nodes. This works really well with html documents (after loading them using FlatReader). However, it fails to split tables into index nodes when we do it with pdf documents (after loading them using PDFReader. Is there a potential way to solve this issue? Thanks in advance.
L
3 comments
PDF parsing is hard -- unstructured is likely not identifying the tables properly in the pdf
Could try a tool like camelot to pull the tables out as well πŸ€”
but if unstructured is failing, camelot probably will too
Add a reply
Sign up and join the conversation on Discord