A community member is having trouble importing the SmartPDFLoader from the llama_index.readers.smart_pdf_loader module. Another community member suggests that the issue might be related to not having the package installed. A link to the documentation for the SmartPDFLoader is provided.
The original poster then responds, mentioning that they think they have the package installed, including Llmsherpa. They also share that they have opted to use specific readers like PyMuPdf and MD reader extractors for SimpleDirectoryReader, and used semantic chunking and MarkdownNodeParser for chunking. The community member wonders if using these alternatives might be a better choice or worth trying for the smart-pdf-loader when working with PDFs.
Another community member asks an interesting question about whether the SimpleDirectoryReader also supports table detection and OCR.
Hey, thanks for the reply. I think so, even Llmsherpa.
Anyway I'll try again, maybe I made a mistake.
In any case, in the end I had opted for specific readers for SimpleDirectoryReader, in my case PyMuPdf and MD reader extractors; and for chunking: for PyMuPdf I used semantic chunking and for MD the MarkdownNodeParser. Do you think it might be a better choice or at least worth trying for smart-pdf-loader (obviously for PDFs )?