Find answers from the community

Updated 3 months ago

I am using Parallelized Ingestion Pipeline, but got error: TypeError: cannot pickle '_thre

I am using Parallelized Ingestion Pipeline, but got error: TypeError: cannot pickle '_thread.RLock' object.
my code is : nodes = pipeline.run(documents=documents, num_workers=4)
k
2 comments
Please stand by @Ar1s as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The error you're encountering typically arises when you're trying to pickle objects that are not serializable. In the context of LlamaIndex's parallel execution, this could be due to the objects you're trying to process in your documents.

The pipeline.run() function in LlamaIndex can be executed with parallel processes by setting num_workers to a value greater than 1. This is done by making use of multiprocessing.Pool to distribute batches of nodes across processors (source ()).

However, the multiprocessing module in Python requires that the objects being processed are picklable or serializable. Certain objects, like the _thread.RLock object in your error message, are not serializable and hence, cannot be used directly with multiprocessing.

To resolve this issue, you might need to revise the objects in your documents to ensure they are serializable. If your documents contain complex objects, you might need to simplify them or provide a method to serialize them before processing.

If the issue persists, it would be helpful to have more information about the structure and content of your documents.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord