Ok, I confirmed that the limit for ada-002 on Azure OpenAI is indeed 2048.
import time
def test_max_batch_size(embed_model: llama_index_AzureOpenAIEmbedding, batch_sizes):
test_text = "This is a test." # Sample text to duplicate for the batch.
max_supported_batch_size = None
for batch_size in batch_sizes:
embed_model.embed_batch_size = batch_size # Set the batch size for the model.
try:
texts = [test_text] * batch_size # Create a batch of duplicated texts.
response = embed_model.get_text_embedding_batch(texts) # Send the batch to the model for embedding.
# If the request is successful, record this batch size as currently the largest successful one.
assert len(response) == batch_size
for r in response:
assert len(r) == 1536
print(f"Batch size of {batch_size} succeeded.")
max_supported_batch_size = batch_size
time.sleep(20) # Add a delay to avoid rate limiting.
except Exception as e:
# Handle specific exceptions or failures based on the API's error responses.
print(f"Batch size of {batch_size} failed with error: {e}")
break # Exit the loop on the first failure.
if max_supported_batch_size:
print(f"Maximum supported batch size is {max_supported_batch_size}")
else:
print("Unable to determine the maximum supported batch size, all tested sizes failed.")
Batch size of 2048 succeeded.
Batch size of 2049 failed with error: The batch size should not be larger than 2048.
Maximum supported batch size is 2048
Edit: it seems that the 2048 limit is imposed in code somewhere, likely in the openai SDK, because the API call for 2049 was not actually made.