Well there's no problem, but since it is not a persistent process, it will download every time I index anything, which is very redundant
I cannot use the built-in file readers as well because they're different from the loaders (I'm speaking about PDFReader vs PDFParser)
Why am I using PDFReader and others instead of the SimpleDirectoryReader? It is because I need to select the appropriate parser based on the mimetype of the file, not the extension, as a safeguard to my application.
So I'm doing something like this:
MIME_TYPES_TO_LOADERS = {
"application/pdf": "PDFReader",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": "DocxReader",
"application/msword": "DocxReader",
"text/csv": "PagedCSVReader",
"text/plain": "SimpleDirectoryReader"
}
mime = magic.from_file(file_path, mime=True)
loader = download_loader(MIME_TYPES_TO_LOADERS[mime])
So preferably, I need those prebundled.