Hi

At a glance

The community members are discussing whether it is possible to have llamaIndex load PDFs directly from an S3 bucket without first downloading the bucket contents locally. The initial response suggests that it may not be possible, but that a custom loader could be written. Another community member mentions that the GitHub repo loader might be able to do something similar, but they couldn't figure it out. One community member tried using a loader from llamahub, but encountered an error about the resource being permanently moved. After some troubleshooting, they were able to resolve the issue by using the "prefix" parameter instead of "key" and specifying the S3 endpoint URL.

Useful resources

TTurner

Hi
Is it inately possible to have llamaIndex load pdfs from an s3 bucket without we having to download the bucket contents locally then reading from there?

7 comments

LLogan M

hmmm I don't think its possible? Youd definitely have to write your own loader though (which tbh is not too hard or scary)

LLogan M

I feel like the github repo loader does something like that, but I can't figure it out at a glance

TTurner

I tried one of the loaders on llamahub
https://llamahub.ai/l/readers/llama-index-readers-s3?from=readers

but i think I run into an error that said the resource has been permanently moved
let me find the exact error message

TTurner

Attachment

LLogan M

google tells me this is an issue with your region name or other credentials? 👀

TTurner

hmm weird, i used same credentials to download files through the CLI though 💀

TTurner

ah resolved it XD
went through the class definition and found out I was supposed to use "prefix" instead of "key", and had to specify the s3_endpoint_url 😆

Add a reply

Find answers from the community

Hi