Find answers from the community

Updated 6 months ago

Ingestion

At a glance

Do yall have any tips on improving file ingestion speed, only using node parser and embeddings but large files are still quite slow

14 comments

also, you can increase the batch size on embeddings (especially if you are using api-based embeddings like openai)

Can node_processor be parallelized?

Ill increase that for sure I think node processor is the slow thing right now

It can be, using the above example 👍

I don't use an ingestion pipeline as it doesn't work for some reason lol, can I just provide num_workers

nope, because we can't multiprocess that low-level (too many un-picklable errors)

I can help you setup an ingestion pipeline

it should be fairly easy

Will ingestion pipeline being parrellelized help then if all I'm doing is node parser and embedding if node parser can't be parrellelized?

Each step in an ingestion pipeline can be parallelized

including the node parser

(but it cant happen directly in the node parser, long story)

Try it out, it will make sense hopefully 😅

Add a reply