Is there any chance to read the images and tables in .pdf files ? For instance if i use GPT4 as my model for service context ? As of now I am using GPT 3.5-turbo.
I read it yes. The conclusion is negative as per my understanding. The only usable way in my opinion and my use-case is Text-Only Vector Store with GPT-4V descriptions.
I am struggling to be able to read images that are in my pdf and I have been trying techniques similar to this do you think im on the right track? Or would you have a better approach to getting images out of pdfs?
What unstructured does from my knowledge (for this) is convert tables in PDFs into HTML tables which are then potentially readable by the LLM, so this wouldn't apply for reading all images, but might work well with tables. In my experience the layout of the table matters a lot too (like does it use spacing vs lines etc).
@Darthus I will try the option @GeoloeG suggested ( you run docker and then call it locally to parse). I am still learning and therefore did not build any evaluation to be sure which one of the solutions is better.