Find answers from the community

Updated 8 months ago

Files

Hey all - I am using the SimpleDirectoryReader to parse some basic word artifacts (docx & pptx). According to the docs they are both supported by default. However, for both I am getting file loading errors. Both txt & pdfs are working fine. They are basic, small artifacts. Any thoughts or suggestions?

Errors:
Failed to load file dev/llamaindex/multi-modal/Bikes.pptx with error: The expanded size of the tensor (16) must match the existing size (14) at non-singleton dimension 0. Target sizes: [16]. Tensor sizes: [14]. Skipping...
Failed to load file /dev/llamaindex/multi-modal/~$ke_Company_Data.docx with error: File is not a zip file. Skipping...


SimpleDirectoryReader - https://docs.llamaindex.ai/en/stable/module_guides/loading/simpledirectoryreader/?gad_source=1&gclid=Cj0KCQjw5cOwBhCiARIsAJ5njubnGYY3NjP8r3E42fQb_lLj3hG8QwN7xhrXol1Qz71aqWshIPDGkk0aAlnREALw_wcB
L
S
3 comments
Seems like the default docx and pptx reader is breaking on your files
These readers are just light wrappers around existing libraries

I suggest just loading the document yourself and putting it into a document object
Add a reply
Sign up and join the conversation on Discord