RagDatasetGenerator
or DatasetGenerator
to generate an eval dataset. However, both of these functions uses completely parallel generation with async_module.gather
which overwhelms the openAI API server and causes mass rate limiting. _agenerate_dataset
function and run these requests in sequence instead of in parallel. But is there a more elegant way to doing this rather than subclassing and hacking it in?