Find answers from the community

Updated 10 months ago

Files

At a glance

The community member is using the SimpleDirectoryReader to parse docx and pptx files, but is encountering file loading errors for these file types, while txt and pdf files are working fine. The community member is seeking suggestions or thoughts on this issue. In the comments, another community member suggests that the default docx and pptx readers may be breaking on the files, and recommends loading the documents directly instead of using the SimpleDirectoryReader wrappers.

Useful resources
Hey all - I am using the SimpleDirectoryReader to parse some basic word artifacts (docx & pptx). According to the docs they are both supported by default. However, for both I am getting file loading errors. Both txt & pdfs are working fine. They are basic, small artifacts. Any thoughts or suggestions?

Errors:
Failed to load file dev/llamaindex/multi-modal/Bikes.pptx with error: The expanded size of the tensor (16) must match the existing size (14) at non-singleton dimension 0. Target sizes: [16]. Tensor sizes: [14]. Skipping...
Failed to load file /dev/llamaindex/multi-modal/~$ke_Company_Data.docx with error: File is not a zip file. Skipping...


SimpleDirectoryReader - https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/?gad_source=1&gclid=Cj0KCQjw5cOwBhCiARIsAJ5njubnGYY3NjP8r3E42fQb_lLj3hG8QwN7xhrXol1Qz71aqWshIPDGkk0aAlnREALw_wcB
L
S
3 comments
Seems like the default docx and pptx reader is breaking on your files
These readers are just light wrappers around existing libraries

I suggest just loading the document yourself and putting it into a document object
Add a reply
Sign up and join the conversation on Discord