Find answers from the community

Updated 2 years ago

Is there a way to pre package my app

At a glance
Is there a way to pre-package my app with the needed loaders? instead of calling download_loader, I want to directly install it with pip and import it during runtime without the need to download
w
L
j
14 comments
As far as I understand, we can directly import them from llama_index.readers.XXXreader , is my understanding correct?
I thiiiink some of the loaders still need to be downloaded. Although there are quite a few already in the repo. https://github.com/jerryjliu/llama_index/tree/b752cdf61d83bd194e10b0a684d36716edce76af/gpt_index/readers

You could call download_loader when your app starts up no? Or if it's docker, you can add that as a sort of build step
How would you achieve that with a build step if it is a runtime import?
Although there are quite a few already in the repo.
Is there a difference between loaders and readers?
I thiiink they are the same, judging by the code lol
Yeah same but just wanted to make sure πŸ˜…
I don't fully understand why do we need to 'download' loaders and why aren't they just packed modules within llama index that we can directly import
You could execute a dummy script that calls download_loader inside of it, during the RUN in your docker file

Plain Text
RUN pip install -r requirements.txt && python ./download_loaders.py && ...
I think it's just because they are seperate repos, and people can use the loaders without having to install llama index.

I'm pretty sure the loaders repo came first (llama hub), but might just be tech debt at this point too
Makes sense in that case, I'll try fiddling around with the code to make i work
The suggestions won't work because I'm calling the python script directly from ruby, so there is no persistent python process

Therefore, I want it pre-packed so that I can just import it
Is it a problem to wait for it to download/install? Should only happen on the first import.

Either way though, @jerryjliu0 I'm not sure if there's a way to pre-package it is there?
Well there's no problem, but since it is not a persistent process, it will download every time I index anything, which is very redundant

I cannot use the built-in file readers as well because they're different from the loaders (I'm speaking about PDFReader vs PDFParser)

Why am I using PDFReader and others instead of the SimpleDirectoryReader? It is because I need to select the appropriate parser based on the mimetype of the file, not the extension, as a safeguard to my application.

So I'm doing something like this:

Plain Text
MIME_TYPES_TO_LOADERS = {
    "application/pdf": "PDFReader",
    "application/vnd.openxmlformats-officedocument.wordprocessingml.document": "DocxReader",
    "application/msword": "DocxReader",
    "text/csv": "PagedCSVReader",
    "text/plain": "SimpleDirectoryReader"
}

mime = magic.from_file(file_path, mime=True)

loader = download_loader(MIME_TYPES_TO_LOADERS[mime])


So preferably, I need those prebundled.
@walid when the module is downloaded, it gets downloaded to your llama_index.readers.llamahub_modules directory (within the installed package), and it gets cached there. so downloads are only triggered again if you explicitly set refresh_cache=True in the download_loader call
we do this just so users have the ability to decide which loader to use (at our current scale of ~80 its ok, but this may expand if we start having 100s to 1000s of loaders)
Add a reply
Sign up and join the conversation on Discord