How can I do parallel processing on

At a glance

The community member is trying to achieve parallel processing on their IngestionPipelines to preprocess large amounts of data (up to 200 documents with 800 pages each) before their customers can start a conversation. They have tried using asyncio.gather on documents and pages, but the results still appear to be sequential. The community members discuss potential solutions, including increasing the num_workers parameter, a PR that may help with parallelization, and issues with pickling certain objects. While they were able to get the parallel processing working in a script, they are still trying to figure out why it's not working within their chat app.

Useful resources

JJoshhhh

How can I do parallel processing on IngestionPipelines?

My conversations have as many as 200 documents with as many as 800 pages, so I need to preprocess data before my customers can start a conversation.

I’ve scoured the docs/code, but haven’t found a way to run multiple pipeline calls at once. I’m currently using asyncio.gather on documents and then pages, call pipeline.arun for each page, but my results still appear to be sequential…

Plain Text

Processed 6 documents in 130.94 seconds
Total number of pages processed: 6
Average time per document: 21.82 seconds
Average time per page: 21.50 seconds
Doc 4 took 16.62 seconds
  Page 1 took 14.89 seconds
Doc 2 took 39.05 seconds
  Page 1 took 38.35 seconds
Doc 6 took 38.55 seconds
  Page 1 took 37.80 seconds
Doc 5 took 93.89 seconds
  Page 1 took 85.75 seconds
Doc 1 took 129.99 seconds
  Page 1 took 128.76 seconds
Doc 3 took 130.94 seconds
  Page 1 took 129.01 seconds

If this test conversation of 6 docs / 6 pages (all small text) took ~20 seconds per page, then the entire job should take ~20 seconds, right? Any recs on how to make this work?

23 comments

JJoshhhh

tryna make these scream

Attachment

LLogan M

Async is more about concurrency i.e. let several api calls go out at once

If you aren't using api-based embeddings or LLMs, you probably won't notice any speedup with async.

If you are using api-based models, try increasing the num_workers kwarg on any metadata extractors

LLogan M

But also, there is this PR https://github.com/run-llama/llama_index/pull/9920

JJoshhhh

Would love to test the PR! Is it release yet?

JJoshhhh

P.S. I had num_workers=8 set for the above runs

LLogan M

Its merged into main, I am about to cut a release 🙂

JJoshhhh

Ready when you are!

LLogan M

v0.9.29 is out 😉

LLogan M

https://github.com/run-llama/llama_index/blob/main/docs/examples/ingestion/parallel_execution_ingestion_pipeline.ipynb

JJoshhhh

Is it still deploying?

Plain Text

(llama-app-backend-py3.11) joshuasabol@Joshuas-MacBook-Pro-2 backend % poetry add llama-index@0.9.29

Could not find a matching version of package llama-index

LLogan M

It might be -- lemme check

LLogan M

hmmm got an error on publish I see

JJoshhhh

Reading https://github.com/run-llama/llama_index/pull/9920

NOTE: I didn't encounter the cannot pickle CoreBPE error. It seems that moving parallelization up to the run method has resulted in not needing to pickle lower-level imports. If we had defined tokenizer here directly like we have in SentenceSplitter, then that's when we'd see the error and need the fix. The same goes for the partial fix for lambda funcs not being pickle-able — that fix is no longer necessary here.

Does this mean SentenceSplitter is not parallelizable? I'm using it in my pipeline

LLogan M

No it is parallelizable -- since it is splitting jobs one level higher, it seems to work

LLogan M

ok, now it published lol

JJoshhhh

I keep getting this error: cannot pickle '_asyncio.Task' object

LLogan M

Can you share a code sample that reproduces that?

JJoshhhh

DMing you

JJoshhhh

Still trying to figure out why it's not working within my chat app, but I got it working via a script, and WOW this is going to be a huge time saver

TYSM @Logan M , @jerryjliu0 , et al

Attachment