Log in
Log into community
Find answers from the community
View all posts
Related posts
Was this helpful?
π
π
π
Powered by
Hall
Inactive
Updated 10 months ago
0
Follow
why this is not working
why this is not working
Inactive
0
Follow
At a glance
k
kanhaiya lal bohra
10 months ago
Β·
why this is not working
pdf = PDFReader().load_data("./d.pdf")
print(pdf)
AttributeError: 'str' object has no attribute 'name'
metadata = {"page_label": page_label, "file_name": file.name}
W
k
V
20 comments
Share
Open in Discord
W
WhiteFang_Jr
10 months ago
You need to provide a path object I think:
https://github.com/run-llama/llama_index/blob/9163067027ea8222e9fe5bffff9a2fac26b57686/llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py#L32
k
kanhaiya lal bohra
10 months ago
but i think the data is not loading i am getting this respone
The context only mentions page labels and file names for a document named "d.pdf".
W
WhiteFang_Jr
10 months ago
actually if you look at the code provided in the link above, it extracts metadata from the path object of that file. Which is not possible from a string
k
kanhaiya lal bohra
10 months ago
so how i can get the data?
k
kanhaiya lal bohra
10 months ago
the text is empty
W
WhiteFang_Jr
10 months ago
where?
W
WhiteFang_Jr
10 months ago
pass in the path object for the file location
k
kanhaiya lal bohra
10 months ago
done but i am trying to print text i am getting empty string but the other details are showing right maybe my pdf have image in it not text?
k
kanhaiya lal bohra
10 months ago
i can't copy the text from browser and from adobe reader too so maybe its image
W
WhiteFang_Jr
10 months ago
Yes possible
k
kanhaiya lal bohra
10 months ago
so how i can get the text any reader with ocr?
W
WhiteFang_Jr
10 months ago
Try llamaparse:
https://www.llamaindex.ai/blog/launching-the-first-genai-native-document-parsing-platform
https://www.llamaindex.ai/blog/introducing-llamacloud-and-llamaparse-af8cedf9006b
k
kanhaiya lal bohra
10 months ago
its not free?
W
WhiteFang_Jr
10 months ago
Free for 1000 pages per day
k
kanhaiya lal bohra
10 months ago
but i need permanent solution after sometime this will be completely paid any other ai model?
W
WhiteFang_Jr
10 months ago
I think 1000 pages per day is a pretty good deal.
If not there is unstructured also that you can check
k
kanhaiya lal bohra
10 months ago
i manage to do this with OCRmyPDF
W
WhiteFang_Jr
10 months ago
How's the quality ?
k
kanhaiya lal bohra
10 months ago
In my case it's getting 100% text only failing when an image or symbol appears
V
VenassaNIU
10 months ago
I think you can try HelloRAG. It excels in recognizing the content within image in pdfs with its advanced OCR. π
https://github.com/HelloRAG
Add a reply
Sign up and join the conversation on Discord
Join on Discord