Find answers from the community

Updated 3 months ago

I think there is a bug in

I think there is a bug in SimpleDirectoryReader or the Documentation needs to be updated.
I am unable to parse/index .ppt but the .pptx's are just fine.

Anyone can help/confirm if what needs to be done.
Thanks,
L
s
p
24 comments
Is there an error? Its just using the PptxReader
Attachment
image.png
Its using from pptx import Presentation to parse, maybe that package is having trouble reading your file
I might recommend trying llama-parse if you haven't already
It is using pptx reader and just parses .pptx and not .ppt.
Inside pptx module
Attachment
image.png
@Logan M llama-parse is limited to pdfs as of now; at least the web version. That I interacted with.
Any Suggestions?
PptxReader parses/processes only .pptx file type not .ppt
llama-parse was recently updated to handle 50+ file types
the .ppt extension is mapped to the PptxReader -- I guess this is an error?
Is this update in API or web_application?
I think just the API -- pip install -U llama-parse
awesome; will check it out - thanks
Hey; tried the API here's the error:
Attachment
screenshot_2024-03-20_135732.png
awe man, I thought it was more robust than that. Maybe it wasn't fully rolled out...
@pld just curious, we have plans to expand that list right?
is there any branch that has that piece that handles .ppt?
yes, we should extend it.
@Logan M Are there any plans to bring this extended support to llama-index readers?
No plans currently, but I welcome a PR to add any new reader
And how is llama-parse handling the extended format(s)?
is there any code I can look into?
Currently its being provided as a SASS service, with options for on-prem VPC deployments in the future. (So no source code for now)
Add a reply
Sign up and join the conversation on Discord