How do I avoid rate limit errors when generating openai embeddings in an ingestion pipeline? The retry logic doesn’t seem to work properly as it eventually just fails after 6 tries. How can I track how many tokens are actually being sent in requests to properly rate limit from my app?
I've got a citation query engine that works fine with a synchronous query. When I change it to use aquery, I get this error: streaming_response = await query_engine.aquery(question)
Error: AsyncStreamingResponse.init() got an unexpected keyword argument 'response_gen'