Find answers from the community

Updated last year

Good afternoon everyone We are trying to

Good afternoon everyone! We are trying to move from OpenAI to Azure OpenAI but are hitting rate limits on embeddings straight away. We think this is because Azure OpenAI allows 240k tokens per min and OpenAI allows 1m. I understand that by lowering the batch size we could potentially reduce the amount of tokens per minute, but I am unsure whether batch size is related to time in anyway, and therefore we still may get this same problem

The only solution I can think of is to introduce a delay somehow between the execution of batches ?
L
H
6 comments
I think usually when people use azure for embeddings, they set the batch size to 1 πŸ€”
the default is 10
if that doesn't help, there maybe needs to be a PR to add "delay rate-limiting" or something πŸ€”
Awesome @Logan M thank you!
Where would you put that delay rate liming in? we have a few devs who are happy to help code test this out
It would be somewhere in the BaseEmbedding class I think, in llama_index/embeddings/base.py
Add a reply
Sign up and join the conversation on Discord