Good afternoon everyone We are trying to

At a glance

Good afternoon everyone! We are trying to move from OpenAI to Azure OpenAI but are hitting rate limits on embeddings straight away. We think this is because Azure OpenAI allows 240k tokens per min and OpenAI allows 1m. I understand that by lowering the batch size we could potentially reduce the amount of tokens per minute, but I am unsure whether batch size is related to time in anyway, and therefore we still may get this same problem

The only solution I can think of is to introduce a delay somehow between the execution of batches ?

6 comments

LLogan M

I think usually when people use azure for embeddings, they set the batch size to 1 🤔

LLogan M

the default is 10

LLogan M

if that doesn't help, there maybe needs to be a PR to add "delay rate-limiting" or something 🤔

HHarrison

Awesome @Logan M thank you!

HHarrison

Where would you put that delay rate liming in? we have a few devs who are happy to help code test this out

LLogan M

It would be somewhere in the BaseEmbedding class I think, in llama_index/embeddings/base.py

Add a reply

Find answers from the community

Good afternoon everyone We are trying to