Find answers from the community

Updated 9 months ago

why this is not working

why this is not working
pdf = PDFReader().load_data("./d.pdf")
print(pdf)

AttributeError: 'str' object has no attribute 'name'
metadata = {"page_label": page_label, "file_name": file.name}
W
k
V
20 comments
but i think the data is not loading i am getting this respone
The context only mentions page labels and file names for a document named "d.pdf".
actually if you look at the code provided in the link above, it extracts metadata from the path object of that file. Which is not possible from a string
so how i can get the data?
pass in the path object for the file location
done but i am trying to print text i am getting empty string but the other details are showing right maybe my pdf have image in it not text?
i can't copy the text from browser and from adobe reader too so maybe its image
so how i can get the text any reader with ocr?
Free for 1000 pages per day
but i need permanent solution after sometime this will be completely paid any other ai model?
I think 1000 pages per day is a pretty good deal.

If not there is unstructured also that you can check
i manage to do this with OCRmyPDF
How's the quality ?
In my case it's getting 100% text only failing when an image or symbol appears
I think you can try HelloRAG. It excels in recognizing the content within image in pdfs with its advanced OCR. πŸ”— https://github.com/HelloRAG
Add a reply
Sign up and join the conversation on Discord