Find answers from the community

Updated last year

Hello team!

Hello team!
i'm trying to read documents using SimpleDirectoryReader but it gives this error.
The error is coming from a pptx file that contains images.

def get_file_metadata(filename: str) -> Dict: file_name = os.path.basename(filename) file_name_without_extension = os.path.splitext(file_name)[0] metadata = {"file_name": file_name_without_extension} return metadata chatbot_data = SimpleDirectoryReader("data/", file_metadata=get_file_metadata).load_data()

Here is my code for Reading Data folder
Attachment
image.png
A
r
20 comments
if i convert pptx file into pdf file using online tool
it gives this error
Attachment
image.png
@Logan M have a look here
@Ahsan Mirza seems like the issue is with PIL while reading image.
If you are converting it to PDF and trying with SimpleDirectoryReader not sure if the images in pdf will be read by 'pypdf'
from PIL import Image

print(Image.registered_extensions())
can you try running this code?
This will list the image formats PIL supports on your system. Check if WMF format is present in the output. If not that is the issue here.
please guide me where should i run this code?
python shell/ jupyter notebook
here are the extensions of the file read by pillow
{'.blp': 'BLP', '.bmp': 'BMP', '.dib': 'DIB', '.bufr': 'BUFR', '.cur': 'CUR', '.pcx': 'PCX', '.dcx': 'DCX', '.dds': 'DDS', '.ps': 'EPS', '.eps': 'EPS', '.fit': 'FITS', '.fits': 'FITS', '.fli': 'FLI', '.flc': 'FLI', '.ftc': 'FTEX', '.ftu': 'FTEX', '.gbr': 'GBR', '.gif': 'GIF', '.grib': 'GRIB', '.h5': 'HDF5', '.hdf': 'HDF5', '.png': 'PNG', '.apng': 'PNG', '.jp2': 'JPEG2000', '.j2k': 'JPEG2000', '.jpc': 'JPEG2000', '.jpf': 'JPEG2000', '.jpx': 'JPEG2000', '.j2c': 'JPEG2000', '.icns': 'ICNS', '.ico': 'ICO', '.im': 'IM', '.iim': 'IPTC', '.jfif': 'JPEG', '.jpe': 'JPEG', '.jpg': 'JPEG', '.jpeg': 'JPEG', '.mpg': 'MPEG', '.mpeg': 'MPEG', '.tif': 'TIFF', '.tiff': 'TIFF', '.mpo': 'MPO', '.msp': 'MSP', '.palm': 'PALM', '.pcd': 'PCD', '.pdf': 'PDF', '.pxr': 'PIXAR', '.pbm': 'PPM', '.pgm': 'PPM', '.ppm': 'PPM', '.pnm': 'PPM', '.psd': 'PSD', '.qoi': 'QOI', '.bw': 'SGI', '.rgb': 'SGI', '.rgba': 'SGI', '.sgi': 'SGI', '.ras': 'SUN', '.tga': 'TGA', '.icb': 'TGA', '.vda': 'TGA', '.vst': 'TGA', '.webp': 'WEBP', '.wmf': 'WMF', '.emf': 'WMF', '.xbm': 'XBM', '.xpm': 'XPM'}
if i run the same code on windows machine then it shows this err
Attachment
image.png
on mac it shows cannot find loader for WMF file and on windows cannot read metaFile
@Logan M @ravitheja
Okay. Interesting, PIL supports WMF but it seems its unable to recognize the format. Weird.
how can i solve this bug?
@Logan M @ravitheja
@Ahsan Mirza If you can share the pptx on DM, I can try checking on my end.
@ravitheja check your dm
from pathlib import Path
from llama_index import download_loader

PptxReader = download_loader("PptxReader")

loader = PptxReader()
documents = loader.load_data(file=Path('./Bank_book_PP_draft.pptx'))
this worked for me fastly
Add a reply
Sign up and join the conversation on Discord