Find answers from the community

Updated 3 weeks ago

Weird Behavior with SimpleDirectoryReader and Load_data()

Hello everyone! Encountered a weird behavior (prob a bug) from SimpleDirectoryReader and load_data(). Namely, when I try to get metadata with multiple workers it bugs out.
'''
reader = SimpleDirectoryReader(dir=dir, file_metadata=get_meta)
documents=reader.load_data(num_workers=4)
'''

results in
can't get attribute 'get_meta' on <module 'main'>

while if I set num_workers = 1 it works fine

Anyone had the same issue?
W
i
L
12 comments
hey your get_meta is a callable func right?

if it is a simple dict I would do it this way:
get_meta= lambda extra_info:{"key":"val"}
yes, it is just def get_meta(path): return {"foo": "bar", "path": path}
lemme try that
doesn't work
similar issue with multiprocessing
can't pickle <function <lambda> at 0xblahblah>: attribute lookup <lambda> on main failed
ok, I think I found the solution, lemme try it
yup, this worked
multiprocessing is weird af, had to define the function in a separate file and then import it
for reference, this happened on python 3.11.4, WSL2 Debian latest
I hate multiprocessing in python lol I almost want to rip this feature out sometimes. Glad it works
Add a reply
Sign up and join the conversation on Discord