Log in
Log into community
Find answers from the community
View all posts
Related posts
Did this answer your question?
😞
😐
😃
Powered by
Hall
Inactive
Updated 2 months ago
0
Follow
Hello all,
Hello all,
Inactive
0
Follow
i
isaackogan
9 months ago
·
Hello all,
I’m looking to expand past XML to PDFs, and the one big issue is the one issue everyone has—tables. Is there a recommended
OSS
way to read them? Specifically something you’d recommend be used with LlamaIndex?
L
i
s
15 comments
Share
Open in Discord
L
Logan M
9 months ago
probably unstructured will be the best OSS solution
L
Logan M
9 months ago
but overall tables are hard
L
Logan M
9 months ago
marked is another OSS library that does ok-ish
i
isaackogan
9 months ago
Is OCR an acceptable solution
L
Logan M
9 months ago
OCR is really only half of the solution
i
isaackogan
9 months ago
in what sense
L
Logan M
9 months ago
Sure you can get the text -- but then you need to make sure its formatted nicely
i
isaackogan
9 months ago
Oh of course yeah
i
isaackogan
9 months ago
Oh crap
i
isaackogan
9 months ago
And then there’s the issue of hyperlinks
i
isaackogan
9 months ago
God I hate PDFs
L
Logan M
9 months ago
it really is the worst file format possible lol
L
Logan M
9 months ago
and the most used
s
shawtyisaten
9 months ago
@isaackogan mind sharing the pdf file you're trying to read?
i
isaackogan
9 months ago
no sorry I’m testing with my employee pay statement 💀
Add a reply
Sign up and join the conversation on Discord
Join on Discord