why this is not working

At a glance

The community members are discussing an issue with loading data from a PDF file. The original post shows an error message indicating that the 'str' object has no attribute 'name'. The comments suggest that the community members need to provide a path object instead of a string to load the PDF data correctly. They also discuss potential issues with the PDF file, such as it containing images instead of text, and suggest using OCR tools like OCRmyPDF or HelloRAG to extract the text. The community members also discuss using a service called llamaparse, which offers free usage for up to 1000 pages per day, as a potential solution.

Useful resources

kkanhaiya lal bohra

why this is not working
pdf = PDFReader().load_data("./d.pdf")
print(pdf)

AttributeError: 'str' object has no attribute 'name'
metadata = {"page_label": page_label, "file_name": file.name}

20 comments

WWhiteFang_Jr

You need to provide a path object I think: https://github.com/run-llama/llama_index/blob/9163067027ea8222e9fe5bffff9a2fac26b57686/llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py#L32

kkanhaiya lal bohra

but i think the data is not loading i am getting this respone
The context only mentions page labels and file names for a document named "d.pdf".

WWhiteFang_Jr

actually if you look at the code provided in the link above, it extracts metadata from the path object of that file. Which is not possible from a string

kkanhaiya lal bohra

so how i can get the data?

kkanhaiya lal bohra

the text is empty

WWhiteFang_Jr

where?

WWhiteFang_Jr

pass in the path object for the file location

kkanhaiya lal bohra

done but i am trying to print text i am getting empty string but the other details are showing right maybe my pdf have image in it not text?

kkanhaiya lal bohra

i can't copy the text from browser and from adobe reader too so maybe its image