I know for PDF parsing, it is just using PyPDF under the hood. If the PDFs are just scanned images (i.e. text is not actually in the PDF), I would convert the pages to images first
Not sure why the docx didn't work. I would try loading a single file and see if it works