Find answers from the community

Updated last year

Hello team!

At a glance

The community member is trying to read documents using SimpleDirectoryReader, but is encountering an error with a PPTX file that contains images. They have tried converting the PPTX file to PDF, but that also gives an error. The issue seems to be related to the PIL library's inability to read the WMF image format. Community members have suggested running a code to list the image formats supported by PIL, and checking if WMF is present. They have also suggested using a different library, such as the PptxReader from the llama_index package, which appears to have worked for one of the community members.

Hello team!
i'm trying to read documents using SimpleDirectoryReader but it gives this error.
The error is coming from a pptx file that contains images.

def get_file_metadata(filename: str) -> Dict: file_name = os.path.basename(filename) file_name_without_extension = os.path.splitext(file_name)[0] metadata = {"file_name": file_name_without_extension} return metadata chatbot_data = SimpleDirectoryReader("data/", file_metadata=get_file_metadata).load_data()

Here is my code for Reading Data folder
Attachment
image.png
A
r
20 comments
if i convert pptx file into pdf file using online tool
it gives this error
Attachment
image.png
@Logan M have a look here
@Ahsan Mirza seems like the issue is with PIL while reading image.
If you are converting it to PDF and trying with SimpleDirectoryReader not sure if the images in pdf will be read by 'pypdf'
from PIL import Image

print(Image.registered_extensions())
can you try running this code?
This will list the image formats PIL supports on your system. Check if WMF format is present in the output. If not that is the issue here.
please guide me where should i run this code?
python shell/ jupyter notebook
here are the extensions of the file read by pillow
{'.blp': 'BLP', '.bmp': 'BMP', '.dib': 'DIB', '.bufr': 'BUFR', '.cur': 'CUR', '.pcx': 'PCX', '.dcx': 'DCX', '.dds': 'DDS', '.ps': 'EPS', '.eps': 'EPS', '.fit': 'FITS', '.fits': 'FITS', '.fli': 'FLI', '.flc': 'FLI', '.ftc': 'FTEX', '.ftu': 'FTEX', '.gbr': 'GBR', '.gif': 'GIF', '.grib': 'GRIB', '.h5': 'HDF5', '.hdf': 'HDF5', '.png': 'PNG', '.apng': 'PNG', '.jp2': 'JPEG2000', '.j2k': 'JPEG2000', '.jpc': 'JPEG2000', '.jpf': 'JPEG2000', '.jpx': 'JPEG2000', '.j2c': 'JPEG2000', '.icns': 'ICNS', '.ico': 'ICO', '.im': 'IM', '.iim': 'IPTC', '.jfif': 'JPEG', '.jpe': 'JPEG', '.jpg': 'JPEG', '.jpeg': 'JPEG', '.mpg': 'MPEG', '.mpeg': 'MPEG', '.tif': 'TIFF', '.tiff': 'TIFF', '.mpo': 'MPO', '.msp': 'MSP', '.palm': 'PALM', '.pcd': 'PCD', '.pdf': 'PDF', '.pxr': 'PIXAR', '.pbm': 'PPM', '.pgm': 'PPM', '.ppm': 'PPM', '.pnm': 'PPM', '.psd': 'PSD', '.qoi': 'QOI', '.bw': 'SGI', '.rgb': 'SGI', '.rgba': 'SGI', '.sgi': 'SGI', '.ras': 'SUN', '.tga': 'TGA', '.icb': 'TGA', '.vda': 'TGA', '.vst': 'TGA', '.webp': 'WEBP', '.wmf': 'WMF', '.emf': 'WMF', '.xbm': 'XBM', '.xpm': 'XPM'}
if i run the same code on windows machine then it shows this err
Attachment
image.png
on mac it shows cannot find loader for WMF file and on windows cannot read metaFile
@Logan M @ravitheja
Okay. Interesting, PIL supports WMF but it seems its unable to recognize the format. Weird.
how can i solve this bug?
@Logan M @ravitheja
@Ahsan Mirza If you can share the pptx on DM, I can try checking on my end.
@ravitheja check your dm
from pathlib import Path
from llama_index import download_loader

PptxReader = download_loader("PptxReader")

loader = PptxReader()
documents = loader.load_data(file=Path('./Bank_book_PP_draft.pptx'))
this worked for me fastly
Add a reply
Sign up and join the conversation on Discord