Tried to use today on a collection of ~1000 blog posts. Ran for an hour without returning anything. Never errored but I eventually stopped out of worry I was using a crazy amount of tokens. Don't see any docs on it.
That's a good idea. Will try tomorrow on a smaller set and report back. Was wondering if there was a parameter to specify number of questions to be generated.
Ah good move I should have just done that. Just tried on a smaller data set and it works. So I should be able to handle a larger dataset with 1 question / chunk. Thanks for pointing this out
I had a directory /blogposts/ with 1000 text files. I was unable to generate questions from this when putting all posts into one data loader, even with questions per chunk set to 1.
However, I found a workaround.
I broke my 1000 blog posts into 50 directories of 20 posts each.
I then iterated through the directories, set up a new reader and question generator for each, and appended the results to a list of questions. This worked fine and I was able to generate >500 questions in about 10-15 minutes.