Hi team, i have a quick question. I'm trying to use the dataset generator in RagDatasetGenerator or DatasetGenerator to generate an eval dataset. However, both of these functions uses completely parallel generation with async_module.gather which overwhelms the openAI API server and causes mass rate limiting.
One way i can overcome this is just to subclass one of these 2 classes, override the _agenerate_dataset function and run these requests in sequence instead of in parallel. But is there a more elegant way to doing this rather than subclassing and hacking it in?
Hmm, we have this async util called run_jobs() that runs async jobs behind a semaphore, so that only X number of async jobs can be ongoing at a given time
The class itself should be updated to use that helper function, instead of asyncio.gather()