Log in
Log into community
Find answers from the community
View all posts
Related posts
Did this answer your question?
π
π
π
Powered by
Hall
Inactive
Updated 9 months ago
0
Follow
why this is not working
why this is not working
Inactive
0
Follow
k
kanhaiya lal bohra
9 months ago
Β·
why this is not working
pdf = PDFReader().load_data("./d.pdf")
print(pdf)
AttributeError: 'str' object has no attribute 'name'
metadata = {"page_label": page_label, "file_name": file.name}
W
k
V
20 comments
Share
Open in Discord
W
WhiteFang_Jr
9 months ago
You need to provide a path object I think:
https://github.com/run-llama/llama_index/blob/9163067027ea8222e9fe5bffff9a2fac26b57686/llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py#L32
k
kanhaiya lal bohra
9 months ago
but i think the data is not loading i am getting this respone
The context only mentions page labels and file names for a document named "d.pdf".
W
WhiteFang_Jr
9 months ago
actually if you look at the code provided in the link above, it extracts metadata from the path object of that file. Which is not possible from a string
k
kanhaiya lal bohra
9 months ago
so how i can get the data?
k
kanhaiya lal bohra
9 months ago
the text is empty
W
WhiteFang_Jr
9 months ago
where?
W
WhiteFang_Jr
9 months ago
pass in the path object for the file location
k
kanhaiya lal bohra
9 months ago
done but i am trying to print text i am getting empty string but the other details are showing right maybe my pdf have image in it not text?
k
kanhaiya lal bohra
9 months ago
i can't copy the text from browser and from adobe reader too so maybe its image
W
WhiteFang_Jr
9 months ago
Yes possible
k
kanhaiya lal bohra
9 months ago
so how i can get the text any reader with ocr?
W
WhiteFang_Jr
9 months ago
Try llamaparse:
https://www.llamaindex.ai/blog/launching-the-first-genai-native-document-parsing-platform
https://www.llamaindex.ai/blog/introducing-llamacloud-and-llamaparse-af8cedf9006b
k
kanhaiya lal bohra
9 months ago
its not free?
W
WhiteFang_Jr
9 months ago
Free for 1000 pages per day
k
kanhaiya lal bohra
9 months ago
but i need permanent solution after sometime this will be completely paid any other ai model?
W
WhiteFang_Jr
9 months ago
I think 1000 pages per day is a pretty good deal.
If not there is unstructured also that you can check
k
kanhaiya lal bohra
9 months ago
i manage to do this with OCRmyPDF
W
WhiteFang_Jr
9 months ago
How's the quality ?
k
kanhaiya lal bohra
9 months ago
In my case it's getting 100% text only failing when an image or symbol appears
V
VenassaNIU
9 months ago
I think you can try HelloRAG. It excels in recognizing the content within image in pdfs with its advanced OCR. π
https://github.com/HelloRAG
Add a reply
Sign up and join the conversation on Discord
Join on Discord