Pa4de

At a glance

The community members are discussing ways to parse complex PDFs and extract tables, images, and text, potentially using large language models (LLMs) like GPT-4 or a locally deployed multimodal model. One community member suggests using a tool called "llama parse" for this task. Another community member asks if they can use their locally deployed model, and the response is that they can, but it might take longer. The discussion also touches on the cost of using a service like llama parse, especially for a large dataset of 1TB. The community members explore different approaches, such as sending PDF pages to a multimodal LLM and prompting it to extract the desired information. However, there is no explicitly marked answer in the provided information.

BBhavya Giri

is there a wat to parse a complex pdf extracting tables, images and text, maybe even llm (gpt4o or maybe a local multimodel one)?
maybe while reading files
@Logan M

10 comments

LLogan M

I mean, this is what llama parse does.

If you want, you can send a pdf page by page to a multi modal llm and prompt it to extract too

BBhavya Giri

can i use my deployed locally deployed model?

LLogan M

Yea why not. Just need to send it image of each page and prompt it to extract

BBhavya Giri

still i need to pay for llamaparse, see i am working 1 TB of data
the cost is gonna sky rocket

BBhavya Giri

pdfs are like these, i have 4 A100 to run some opensource llm

BBhavya Giri

any suggestion on approch i should follow

BBhavya Giri

@Logan M

LLogan M

I meant if you have a local multimodal llm running, you could use that instead. Just might take a while

LLogan M

Anything is going to be expensive with 1TB of data my guy lol

BBhavya Giri

do you have any suggested blog post for mulitmodel or something similar to this?

Add a reply

Find answers from the community

Pa4de