Find answers from the community

Updated 5 months ago

Unstructured

At a glance

The community members are discussing the use of the UnstructuredElementNodeParser and the challenges of parsing HTML and PDF documents. The original poster asks if there is an easy way to use a high-resolution strategy with the UnstructuredElementNodeParser. The comments suggest that the parser is designed for HTML text, but the community members are considering using it for PDF documents as well. However, they note that parsing PDFs can be challenging, as they do not support tables well. One community member suggests that they may need to write a custom function to handle PDF parsing.

When using UnstructuredElementNodeParser is there an easy way to get it to use a hi-res strategy ?
L
N
5 comments
Hmmm hi-res for html?
I was under the impression the unstructured element node parser runs on html text
O does it? I was planning on using a PDF ultimately. But it doesn't seem to capture all the tables and images correctly. I'll try it again when I get home
PDFs are low-key a nightmare to parse. I don't think anyone has solved it yet πŸ˜…
So i noticed that html doesnt support tables. Which means i am going to have to write a little function for using pdfs
Add a reply
Sign up and join the conversation on Discord