Find answers from the community

Updated 5 months ago

hi! does anyone know how to correctly

At a glance

A community member is having trouble importing the SmartPDFLoader from the llama_index.readers.smart_pdf_loader module. Another community member suggests that the issue might be related to not having the package installed. A link to the documentation for the SmartPDFLoader is provided.

The original poster then responds, mentioning that they think they have the package installed, including Llmsherpa. They also share that they have opted to use specific readers like PyMuPdf and MD reader extractors for SimpleDirectoryReader, and used semantic chunking and MarkdownNodeParser for chunking. The community member wonders if using these alternatives might be a better choice or worth trying for the smart-pdf-loader when working with PDFs.

Another community member asks an interesting question about whether the SimpleDirectoryReader also supports table detection and OCR.

Useful resources
hi! does anyone know how to correctly import SmartPDFLoader ? i tried

'from llama_index.readers.smart_pdf_loader import SmartPDFLoader'

but it doesn't seem to work.
L
c
o
4 comments
Did you install it though?
Hey, thanks for the reply. I think so, even Llmsherpa.

Anyway I'll try again, maybe I made a mistake.

In any case, in the end I had opted for specific readers for SimpleDirectoryReader, in my case PyMuPdf and MD reader extractors; and for chunking: for PyMuPdf I used semantic chunking and for MD the MarkdownNodeParser. Do you think it might be a better choice or at least worth trying for smart-pdf-loader (obviously for PDFs )?
its interesting question, do SimpleDirectoryReader also table detection and ocr?
Add a reply
Sign up and join the conversation on Discord