Find answers from the community

Updated 2 days ago

Processing a Document in AWS S3

I have a slight problem trying to process a document in AWS s3, I am able to upload it but the s3 reader is unable to work with the file am sending it. Am getting an error
Plain Text
Failed to load file ncl-staging/organizations/c28bd98c-5a1c-4bad-94be-718be1a32ec9/documents/knowledge.pdf with error: The input file_path must be a string or a list of strings.. Skipping...

I tried setting up a local environment and tested it out and it worked but for some reason it fails in production.
Here is the code
Plain Text
 # Process document
            loader = S3Reader(
                bucket=bucket_name,
                key=object_key,
                aws_access_id=settings.AWS_ACCESS_KEY_ID,
                aws_access_secret=settings.AWS_SECRET_ACCESS_KEY,
                s3_endpoint_url=get_s3_endpoint(settings.AWS_REGION),
                file_extractor=self.file_extractors,
            )
            llama_documents = await loader.aload_data()

            if not llama_documents:
                raise ValueError("No documents were processed from the input file")
W
b
6 comments
I faced this similar issue 2 days back, My issue was due to converting the PurreposixPath to WindowsPath in Default PDFReader

I solved it by Inheriting the PDF reader class and making the change for file.

The only thing you have diff is that it worked for you when you tried locally
Once you have this PDFParser, pass it in S3Reader like this:
Plain Text
loader = S3Reader(
                bucket=bucket_name,
                key=object_key,
                aws_access_id=settings.AWS_ACCESS_KEY_ID,
                aws_access_secret=settings.AWS_SECRET_ACCESS_KEY,
                s3_endpoint_url=get_s3_endpoint(settings.AWS_REGION),
                file_extractor={".pdf":PDFParser()}
            )
Try with this, see if this works for you
@WhiteFang_Jr am using llama parse
I tried handling it with just the PDFParser, and it worked locally just not on production server running on ECS
Oh okay then ignore the above shared approach
Add a reply
Sign up and join the conversation on Discord