Find answers from the community

Updated 2 years ago

PDFReader should be able to parse S3

At a glance

ssmokeoX

PDFReader should be able to parse S3 URLs right? I am 100% sure I had this working a few days back

6 comments

ssmokeoX

I am seeing

Plain Text

FileNotFoundError: [Errno 2] No such file or directory: 'https:/mybucket.s3.us-east-1.amazonaws.com/s3--fb3cf9f4addf/bitcoin.pdf'

but when I go to that url in my browser, i can see the pdf (fully publicly accessible)

ssmokeoX

from:

Plain Text

        PDFReader = download_loader("PDFReader")
        loader = PDFReader()
        documents = loader.load_data(file=Path(chatbotUrl));

jjerryjliu0

@smokeoX hm the pdfreader reads local files, was this working for you before? you could try our s3 reader (which would call the pdfreader under the hood if it's a pdf file)

ssmokeoX

kk, will try S3 reader, I remember having a different set of issues with that but prob makes sense to use the right tool for it! 😄

ssmokeoX

thanks @jerryjliu0

ssmokeoX

hmm with that I am getting:

Plain Text

FileNotFoundError: [Errno 2] No such file or directory: '/Users/sim/.pyenv/versions/3.9.2/lib/python3.9/site-packages/llama_index/readers/llamahub_modules/file/base.py'

but i think this is related to packages/environment?

Add a reply