Unstructured

At a glance

The community members are discussing the use of the UnstructuredElementNodeParser and the challenges of parsing HTML and PDF documents. The original poster asks if there is an easy way to use a high-resolution strategy with the UnstructuredElementNodeParser. The comments suggest that the parser is designed for HTML text, but the community members are considering using it for PDF documents as well. However, they note that parsing PDFs can be challenging, as they do not support tables well. One community member suggests that they may need to write a custom function to handle PDF parsing.

NNPC_Kenny

When using UnstructuredElementNodeParser is there an easy way to get it to use a hi-res strategy ?

5 comments

LLogan M

Hmmm hi-res for html?

LLogan M

I was under the impression the unstructured element node parser runs on html text

NNPC_Kenny

O does it? I was planning on using a PDF ultimately. But it doesn't seem to capture all the tables and images correctly. I'll try it again when I get home

LLogan M

PDFs are low-key a nightmare to parse. I don't think anyone has solved it yet 😅

NNPC_Kenny

So i noticed that html doesnt support tables. Which means i am going to have to write a little function for using pdfs

Add a reply

Find answers from the community

Unstructured