Hi could anyone help me on extracting

rrajandevkota.

Hi could anyone help me on extracting image content of PDf for RAG? I tried ImageVisionLLMReader but couldn't find proper file_extractor format for it. Any suggestion will be helpful. Since, there are enterprise data, I am supposed to use open weights models @everyone

9 comments

WWhiteFang_Jr

Have you tried LlamaParse ?
https://github.com/run-llama/llama_parse
It can help you extract context from images

rrajandevkota.

Isn't this a paid service? I can't use paid service as those data can't be sent to another cloud. I want something with open source models and services.

WWhiteFang_Jr

Then it'll be a bit tricky! You'll have to use open-source model like Llava to extract what is present in the images first.

There is a TOS for Llamaparse that you can read though, it retains your data only for 48hours

rrajandevkota.

Thank you for your response. I can host Vision LLM (open source) but I am not able which Module of llamaindex supports that. I am following this https://llamahub.ai/l/readers/llama-index-readers-file?from= but couldn't get file extractor for ImageVisionLLMReader. Could you please help me on it?

WWhiteFang_Jr

If the module is not present you can create a custom Reader class, Its very easy to implement.
https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/?h=simpledire#extending-to-other-file-types

WWhiteFang_Jr

Also in your case if you are using ImageVisionLLMReader then it returns list of Document object that you can pass to create index.
No need to pass it anywhere else

rrajandevkota.

Yeah I got your point, but I couldn't find any already present module which extracts the information of the graph or images present in the documents, Is there any way to do it?

jjimmy6dof

you might want to look at ColPali for image content search (index your img/pdf library for search) or for per image captioning &/or OCR there is the new MiniCPM-V-2.6 both are small enough to host and open license ( and you can include both in LlamaIndex pipelines)

rrajandevkota.

Actually what am I looking for is something that append caption of image at the end of page content, just like pdf_partition adds OCR to the image to the end of page. I can use BLIP2 for captioning. Just wondering if Llamaindex has something for this.

Add a reply

Find answers from the community

Hi could anyone help me on extracting