Hi could anyone help me on extracting image content of PDf for RAG? I tried ImageVisionLLMReader but couldn't find proper file_extractor format for it. Any suggestion will be helpful. Since, there are enterprise data, I am supposed to use open weights models @everyone
Isn't this a paid service? I can't use paid service as those data can't be sent to another cloud. I want something with open source models and services.
Thank you for your response. I can host Vision LLM (open source) but I am not able which Module of llamaindex supports that. I am following this https://llamahub.ai/l/readers/llama-index-readers-file?from= but couldn't get file extractor for ImageVisionLLMReader. Could you please help me on it?
Also in your case if you are using ImageVisionLLMReader then it returns list of Document object that you can pass to create index. No need to pass it anywhere else
Yeah I got your point, but I couldn't find any already present module which extracts the information of the graph or images present in the documents, Is there any way to do it?
you might want to look at ColPali for image content search (index your img/pdf library for search) or for per image captioning &/or OCR there is the new MiniCPM-V-2.6 both are small enough to host and open license ( and you can include both in LlamaIndex pipelines)
Actually what am I looking for is something that append caption of image at the end of page content, just like pdf_partition adds OCR to the image to the end of page. I can use BLIP2 for captioning. Just wondering if Llamaindex has something for this.