Find answers from the community

Updated 2 years ago

But it did not load anything when I was

At a glance

But it did not load anything when I was testing with a 50Mb pdf and a 1MB docx

5 comments

Interesting 🤔 And no errors?

I know for PDF parsing, it is just using PyPDF under the hood. If the PDFs are just scanned images (i.e. text is not actually in the PDF), I would convert the pages to images first

Not sure why the docx didn't work. I would try loading a single file and see if it works

RRay Li

Thank you. I will try that

RRay Li

is there any list for files supported?

RRay Li

Kinda blind on what is supported and what’s not

LLogan M

For the SimpleDirectoryReader, the list is here: https://github.com/jerryjliu/llama_index/blob/main/gpt_index/readers/file/base.py#L19

Add a reply