Find answers from the community

Updated 6 months ago

Pa4de

At a glance

The community members are discussing ways to parse complex PDFs and extract tables, images, and text, potentially using large language models (LLMs) like GPT-4 or a locally deployed multimodal model. One community member suggests using a tool called "llama parse" for this task. Another community member asks if they can use their locally deployed model, and the response is that they can, but it might take longer. The discussion also touches on the cost of using a service like llama parse, especially for a large dataset of 1TB. The community members explore different approaches, such as sending PDF pages to a multimodal LLM and prompting it to extract the desired information. However, there is no explicitly marked answer in the provided information.

is there a wat to parse a complex pdf extracting tables, images and text, maybe even llm (gpt4o or maybe a local multimodel one)?
maybe while reading files
@Logan M
L
B
10 comments
I mean, this is what llama parse does.

If you want, you can send a pdf page by page to a multi modal llm and prompt it to extract too
can i use my deployed locally deployed model?
Yea why not. Just need to send it image of each page and prompt it to extract
still i need to pay for llamaparse, see i am working 1 TB of data
the cost is gonna sky rocket
pdfs are like these, i have 4 A100 to run some opensource llm
any suggestion on approch i should follow
I meant if you have a local multimodal llm running, you could use that instead. Just might take a while
Anything is going to be expensive with 1TB of data my guy lol
do you have any suggested blog post for mulitmodel or something similar to this?
Add a reply
Sign up and join the conversation on Discord