Find answers from the community

Updated 10 months ago

why this is not working

At a glance

The community members are discussing an issue with loading data from a PDF file. The original post shows an error message indicating that the 'str' object has no attribute 'name'. The comments suggest that the community members need to provide a path object instead of a string to load the PDF data correctly. They also discuss potential issues with the PDF file, such as it containing images instead of text, and suggest using OCR tools like OCRmyPDF or HelloRAG to extract the text. The community members also discuss using a service called llamaparse, which offers free usage for up to 1000 pages per day, as a potential solution.

Useful resources
why this is not working
pdf = PDFReader().load_data("./d.pdf")
print(pdf)

AttributeError: 'str' object has no attribute 'name'
metadata = {"page_label": page_label, "file_name": file.name}
W
k
V
20 comments
but i think the data is not loading i am getting this respone
The context only mentions page labels and file names for a document named "d.pdf".
actually if you look at the code provided in the link above, it extracts metadata from the path object of that file. Which is not possible from a string
so how i can get the data?
pass in the path object for the file location
done but i am trying to print text i am getting empty string but the other details are showing right maybe my pdf have image in it not text?
i can't copy the text from browser and from adobe reader too so maybe its image
so how i can get the text any reader with ocr?
Free for 1000 pages per day
but i need permanent solution after sometime this will be completely paid any other ai model?
I think 1000 pages per day is a pretty good deal.

If not there is unstructured also that you can check
i manage to do this with OCRmyPDF
How's the quality ?
In my case it's getting 100% text only failing when an image or symbol appears
I think you can try HelloRAG. It excels in recognizing the content within image in pdfs with its advanced OCR. πŸ”— https://github.com/HelloRAG
Add a reply
Sign up and join the conversation on Discord