Good afternoon everyone! We are trying to move from OpenAI to Azure OpenAI but are hitting rate limits on embeddings straight away. We think this is because Azure OpenAI allows 240k tokens per min and OpenAI allows 1m. I understand that by lowering the batch size we could potentially reduce the amount of tokens per minute, but I am unsure whether batch size is related to time in anyway, and therefore we still may get this same problem
The only solution I can think of is to introduce a delay somehow between the execution of batches ?