Find answers from the community

Updated last year

Hi everyone, I'm working on the llama-

At a glance

Hi everyone, I'm working on the llama-index pdf Q&A where tables are present. I need to build using open-source model not with OPENAI. If there is any resource kindly share!
Thank you.

12 comments

WWhiteFang_Jr

Table extraction is not good with PDF parsers that parse via text for tables. Sometimes it misses the edge table cases.

You could try using https://llamahub.ai/l/nougat_ocr

It uses OCR to extract information from the table.

Rest for Response generation and embedding you can use open source models. LlamaIndex supports variety of LLMs.

https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#modules

AAr#9696

Thanks for the reply.
Further, I want to work on a Q&A system with a PDF(annual report) where questions can be from the table. still, it should give answers. Can it be possible with llama-index??

WWhiteFang_Jr

Yes

WWhiteFang_Jr

I haven't tried the OCR pdf parser myself but I think somone did and for table purpose.

AAr#9696

I have tried sample coding with the txt file as input. Results are too hallucinating. I have used open source model Microsoft phi

AAr#9696

Will try today..

AAr#9696

If u have any sources pls share

WWhiteFang_Jr

Ah, so llm model with smaller size that too open source are not good at following instructions.

Llamaindex has created a compatibility report for some llm models on various factors:https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html#llm-compatibility-tracking

AAr#9696

Okk got it

AAr#9696

Basically, I want to try Q&A for PDF which has some paragraphs and complex tables. So here it should be OCR + normal way of RAG right??

WWhiteFang_Jr

Yes

WWhiteFang_Jr

That should work

Add a reply