Log in
Log into community
Find answers from the community
s
F
Y
a
P
3,278
View all posts
Related posts
Did this answer your question?
π
π
π
Powered by
Hall
Inactive
Updated last month
0
Follow
Ingestion
Ingestion
0
Follow
m
maybe goats dont exist
9 months ago
Β·
Do yall have any tips on improving file ingestion speed, only using node parser and embeddings but large files are still quite slow
W
L
m
14 comments
Share
Open in Discord
W
WhiteFang_Jr
9 months ago
You could try parallel ingestion:
https://docs.llamaindex.ai/en/stable/module_guides/loading/ingestion_pipeline/root.html#parallel-processing
L
Logan M
9 months ago
also, you can increase the batch size on embeddings (especially if you are using api-based embeddings like openai)
m
maybe goats dont exist
9 months ago
Can node_processor be parallelized?
m
maybe goats dont exist
9 months ago
Ill increase that for sure I think node processor is the slow thing right now
L
Logan M
9 months ago
It can be, using the above example π
m
maybe goats dont exist
9 months ago
I don't use an ingestion pipeline as it doesn't work for some reason lol, can I just provide num_workers
L
Logan M
9 months ago
nope, because we can't multiprocess that low-level (too many un-picklable errors)
L
Logan M
9 months ago
I can help you setup an ingestion pipeline
L
Logan M
9 months ago
it should be fairly easy
m
maybe goats dont exist
9 months ago
Will ingestion pipeline being parrellelized help then if all I'm doing is node parser and embedding if node parser can't be parrellelized?
L
Logan M
9 months ago
Each step in an ingestion pipeline can be parallelized
L
Logan M
9 months ago
including the node parser
L
Logan M
9 months ago
(but it cant happen directly in the node parser, long story)
L
Logan M
9 months ago
Try it out, it will make sense hopefully π
Add a reply
Sign up and join the conversation on Discord
Join on Discord