Find answers from the community

r
rdx99.
Offline, last seen 3 months ago
Joined September 25, 2024
r
rdx99.
·

Hey ,

Hey ,
Getting an error after many warnings like below when using IngestionPipeline with parallelization, transformations use hierarchical node parser :
Plain Text
create_hierarchical_index_qdrant Creating pipeline
WARNINGS /python3.11/site-packages/llama_index/core/schema.py:94:__getstate__ Removing unpickleable private attribute _chunking_tokenizer_fn  
1 comment
L
Is there an issue in IngestionPipeline when using parallel processing mode?
Plain Text
        transformations = [
            HierarchicalNodeParser.from_defaults(chunk_sizes=[4096, 2048]),
            Settings.embed_model,
        ]
        logger.info("Creating pipeline")
        pipeline = IngestionPipeline(transformations=transformations)
        # pipeline.disable_cache = False
        logger.info('Num workers: ' + str(os.cpu_count()))

        nodes = pipeline.run(
            documents=createHierarchicalIndexRequest.Documents,
            num_workers=4,
        )


My pipeline doesn't return any err messages nor executes further after pipeline.run() call. If I remove num_workers arg it runs but its extremely slow, any advice?
13 comments
r
L
I tried some things but not working , can anyone have a look ? cc @Logan M
5 comments
r
L