Find answers from the community

Updated last year

Hi Team I was using

At a glance

Hi Team, I was using UnstructuredElementNodeParser to split document into text nodes and index nodes. This works really well with html documents (after loading them using FlatReader). However, it fails to split tables into index nodes when we do it with pdf documents (after loading them using PDFReader. Is there a potential way to solve this issue? Thanks in advance.

3 comments

LLogan M

PDF parsing is hard -- unstructured is likely not identifying the tables properly in the pdf

LLogan M

Could try a tool like camelot to pull the tables out as well 🤔

LLogan M

but if unstructured is failing, camelot probably will too

Add a reply