Is there a way to pre package my app

At a glance

The community member is asking if there is a way to pre-package their app with the needed loaders, instead of having to call download_loader and download them at runtime. The comments discuss that while some loaders are already available in the repository, others still need to be downloaded. There is a suggestion to execute a script that calls download_loader during a Docker build step, but the community member notes that this won't work because they are calling the Python script directly from Ruby, so there is no persistent Python process.

The community member explains that they need the loaders to be pre-bundled because the downloads will be redundant in their use case, where they are selecting the appropriate parser based on the file's MIME type, not the extension. They are using loaders like PDFReader and DocxReader instead of the built-in SimpleDirectoryReader for this purpose.

The community members discuss the difference between loaders and readers, and why the loaders are not just packed modules within the llama_index library. One community member suggests that this is likely due to the loaders being in a separate repository, allowing users to use the loaders without having to install the entire llama_index library.

Useful resources

wwalid

Is there a way to pre-package my app with the needed loaders? instead of calling download_loader, I want to directly install it with pip and import it during runtime without the need to download

14 comments

wwalid

As far as I understand, we can directly import them from llama_index.readers.XXXreader , is my understanding correct?

LLogan M

I thiiiink some of the loaders still need to be downloaded. Although there are quite a few already in the repo. https://github.com/jerryjliu/llama_index/tree/b752cdf61d83bd194e10b0a684d36716edce76af/gpt_index/readers

You could call download_loader when your app starts up no? Or if it's docker, you can add that as a sort of build step

wwalid

How would you achieve that with a build step if it is a runtime import?

wwalid

Although there are quite a few already in the repo.

Is there a difference between loaders and readers?

LLogan M

I thiiink they are the same, judging by the code lol

wwalid

Yeah same but just wanted to make sure 😅
I don't fully understand why do we need to 'download' loaders and why aren't they just packed modules within llama index that we can directly import

LLogan M

You could execute a dummy script that calls download_loader inside of it, during the RUN in your docker file

Plain Text

RUN pip install -r requirements.txt && python ./download_loaders.py && ...

LLogan M

I think it's just because they are seperate repos, and people can use the loaders without having to install llama index.

I'm pretty sure the loaders repo came first (llama hub), but might just be tech debt at this point too

wwalid

Makes sense in that case, I'll try fiddling around with the code to make i work

wwalid

The suggestions won't work because I'm calling the python script directly from ruby, so there is no persistent python process

Therefore, I want it pre-packed so that I can just import it

LLogan M

Is it a problem to wait for it to download/install? Should only happen on the first import.

Either way though, @jerryjliu0 I'm not sure if there's a way to pre-package it is there?

wwalid

Well there's no problem, but since it is not a persistent process, it will download every time I index anything, which is very redundant

I cannot use the built-in file readers as well because they're different from the loaders (I'm speaking about PDFReader vs PDFParser)

Why am I using PDFReader and others instead of the SimpleDirectoryReader? It is because I need to select the appropriate parser based on the mimetype of the file, not the extension, as a safeguard to my application.

So I'm doing something like this:

Plain Text

MIME_TYPES_TO_LOADERS = {
    "application/pdf": "PDFReader",
    "application/vnd.openxmlformats-officedocument.wordprocessingml.document": "DocxReader",
    "application/msword": "DocxReader",
    "text/csv": "PagedCSVReader",
    "text/plain": "SimpleDirectoryReader"
}

mime = magic.from_file(file_path, mime=True)

loader = download_loader(MIME_TYPES_TO_LOADERS[mime])

So preferably, I need those prebundled.

jjerryjliu0

@walid when the module is downloaded, it gets downloaded to your llama_index.readers.llamahub_modules directory (within the installed package), and it gets cached there. so downloads are only triggered again if you explicitly set refresh_cache=True in the download_loader call

jjerryjliu0

we do this just so users have the ability to decide which loader to use (at our current scale of ~80 its ok, but this may expand if we start having 100s to 1000s of loaders)

Add a reply

Find answers from the community

Is there a way to pre package my app