Is there an issue in IngestionPipeline when using parallel processing mode?
transformations = [
HierarchicalNodeParser.from_defaults(chunk_sizes=[4096, 2048]),
Settings.embed_model,
]
logger.info("Creating pipeline")
pipeline = IngestionPipeline(transformations=transformations)
# pipeline.disable_cache = False
logger.info('Num workers: ' + str(os.cpu_count()))
nodes = pipeline.run(
documents=createHierarchicalIndexRequest.Documents,
num_workers=4,
)
My pipeline doesn't return any err messages nor executes further after
pipeline.run()
call. If I remove
num_workers
arg it runs but its extremely slow, any advice?