It's unfortunate that we have to time.sleep(0.01) in the StreamingAgentChatResponse. In JavaScript, there's a trick to set a timeout of 0ms for upcoming work and it achieves the same goal of allowing other workers to process things ahead of our work, but allows our work to proceed as quickly as possible. Can we do that in Python too? What happens if we time.sleep(0) instead?
According to StackOverflow, the decision of "who goes next" is up to the scheduler of the OS.
So I think this time.sleep(0) trick will work. Your overall responses should be faster but: 1) it will still consume 100% CPU however 2) it will be polite and let other threads go (as long as the scheduler allows it)