Hello! I tried asking kapa, but maybe I'm on ignore (or, more likely, I did it wrong, lol). Anyway, what I wanted to ask is that many of the PDFs that I'm attempting to ingest are encoded in various formats... for example streamlter [ /ASCII85Decode /FlateDecode ] /Length 16927 /Subtype /Type1C >>
This results in nonsense being ingested when I try to load it. How would I go about detecting and decoding the various encoding formats while ingesting PDFs?
I think I figured out the problem. I was loading temp files that had no extension, so the SimpleDirectoryReader was just loading the TextReader instead of the PDFReader for those files.
I'm switching to using the readers directly since I know the file types for each temp file