Find answers from the community

Updated 6 months ago

I am using Parallelized Ingestion Pipeline, but got error: TypeError: cannot pickle '_thre

At a glance

The community member is using a Parallelized Ingestion Pipeline but encountered a TypeError: cannot pickle '_thread.RLock' object error. The error typically arises when trying to pickle objects that are not serializable. In the context of LlamaIndex's parallel execution, this could be due to the objects in the documents being processed.

The community member is advised to revise the objects in their documents to ensure they are serializable. If the documents contain complex objects, they may need to simplify them or provide a method to serialize them before processing. If the issue persists, more information about the structure and content of the documents would be helpful.

I am using Parallelized Ingestion Pipeline, but got error: TypeError: cannot pickle '_thread.RLock' object.
my code is : nodes = pipeline.run(documents=documents, num_workers=4)
k
2 comments
Please stand by @Ar1s as I search the knowledge base. I normally take about 20 seconds or so. :wave:
----------------------
The error you're encountering typically arises when you're trying to pickle objects that are not serializable. In the context of LlamaIndex's parallel execution, this could be due to the objects you're trying to process in your documents.

The pipeline.run() function in LlamaIndex can be executed with parallel processes by setting num_workers to a value greater than 1. This is done by making use of multiprocessing.Pool to distribute batches of nodes across processors (source ()).

However, the multiprocessing module in Python requires that the objects being processed are picklable or serializable. Certain objects, like the _thread.RLock object in your error message, are not serializable and hence, cannot be used directly with multiprocessing.

To resolve this issue, you might need to revise the objects in your documents to ensure they are serializable. If your documents contain complex objects, you might need to simplify them or provide a method to serialize them before processing.

If the issue persists, it would be helpful to have more information about the structure and content of your documents.
----------------------
Note: you can ask me a follow up question by @ mentioning me again :speech_balloon:
----------------------
Add a reply
Sign up and join the conversation on Discord