Find answers from the community

Updated 11 months ago

Is there any chance to read the images

Is there any chance to read the images and tables in .pdf files ? For instance if i use GPT4 as my model for service context ? As of now I am using GPT 3.5-turbo.
2
m
L
G
13 comments
I read it yes. The conclusion is negative as per my understanding. The only usable way in my opinion and my use-case is Text-Only Vector Store with GPT-4V descriptions.
Is the SimpleDirectoryReader the most sufficient at parsing pdf-s or should i use a loader ?
give this a ry: https://github.com/nlmatics/llmsherpa

not perfect, but already impressive result!
@LoLiPoPMaN Also this, I'm also trying to get PDF table reading to work: https://www.youtube.com/live/oa82yoJ6zYc?si=W2z4dPQsnRCJwHMc
Thanks for sharing!
It uses Unstructured.io, which in the video they said worked better than OCR: https://llamahub.ai/l/file-unstructured
Will check it out. Appreciate it!
Ping me back if you have success I'm also curious. πŸ™‚
Will do...Lets keep conv in this thread ? Will probably look at it tomorrow.
I am struggling to be able to read images that are in my pdf and I have been trying techniques similar to this do you think im on the right track? Or would you have a better approach to getting images out of pdfs?
What unstructured does from my knowledge (for this) is convert tables in PDFs into HTML tables which are then potentially readable by the LLM, so this wouldn't apply for reading all images, but might work well with tables. In my experience the layout of the table matters a lot too (like does it use spacing vs lines etc).
@Darthus I will try the option @GeoloeG suggested ( you run docker and then call it locally to parse). I am still learning and therefore did not build any evaluation to be sure which one of the solutions is better.
Add a reply
Sign up and join the conversation on Discord