Find answers from the community

Updated last year

Is there any chance to read the images

At a glance

Is there any chance to read the images and tables in .pdf files ? For instance if i use GPT4 as my model for service context ? As of now I am using GPT 3.5-turbo.

13 comments

mmkern

did you see this yet? https://docs.llamaindex.ai/en/stable/examples/multi_modal/multi_modal_pdf_tables.html#experiment-1-retrieving-relevant-images-pdf-pages-and-sending-them-to-gpt4-v-to-respond-to-queries

LLoLiPoPMaN

I read it yes. The conclusion is negative as per my understanding. The only usable way in my opinion and my use-case is Text-Only Vector Store with GPT-4V descriptions.

LLoLiPoPMaN

Is the SimpleDirectoryReader the most sufficient at parsing pdf-s or should i use a loader ?

GGeoloeG

give this a ry: https://github.com/nlmatics/llmsherpa

not perfect, but already impressive result!

DDarthus

@LoLiPoPMaN Also this, I'm also trying to get PDF table reading to work: https://www.youtube.com/live/oa82yoJ6zYc?si=W2z4dPQsnRCJwHMc

LLoLiPoPMaN

Thanks for sharing!

DDarthus

It uses Unstructured.io, which in the video they said worked better than OCR: https://llamahub.ai/l/file-unstructured

LLoLiPoPMaN

Will check it out. Appreciate it!

DDarthus

Ping me back if you have success I'm also curious. 🙂

LLoLiPoPMaN

Will do...Lets keep conv in this thread ? Will probably look at it tomorrow.

sstanelllllllllyyy

I am struggling to be able to read images that are in my pdf and I have been trying techniques similar to this do you think im on the right track? Or would you have a better approach to getting images out of pdfs?

DDarthus

What unstructured does from my knowledge (for this) is convert tables in PDFs into HTML tables which are then potentially readable by the LLM, so this wouldn't apply for reading all images, but might work well with tables. In my experience the layout of the table matters a lot too (like does it use spacing vs lines etc).

LLoLiPoPMaN

@Darthus I will try the option @GeoloeG suggested ( you run docker and then call it locally to parse). I am still learning and therefore did not build any evaluation to be sure which one of the solutions is better.

Add a reply