Is there a good way to finish creating

At a glance

The community member is trying to create embeddings for nodes before inserting them into a remote index, specifically an Azure Cognitive Search Storage Context. They are building an in-memory index to get the embeddings, then using the SimpleVectorStoreData property to create nodes with embeddings, and finally building the VectorStoreIndex.

In the comments, another community member suggests using the embedding model directly to get the embeddings, which may have advantages around batching and retrying. The community members discuss the batch size limits for the OpenAI and Azure OpenAI APIs, with the consensus being that the limit is 2048 for the ada-002 model on Azure OpenAI. They also discuss rate limiting and the differences between OpenAI and Azure OpenAI in this regard.

There is no explicitly marked answer in the post and comments.

Useful resources

sskittythecat

Is there a good way to finish creating embeddings before inserting nodes to a remote index? I'm doing this by building an in-memory index and then accessing the SimpleVectorStoreData property to create nodes with embeddings, and using that to build my VectorStoreIndex over an Azure Cognitive Search Storage Context.

Plain Text

        logger.info("Indexing may take a while.  Please wait...")
        # First create an in memory index to get embeddings for all nodes.
        local_vector_store_index = VectorStoreIndex.from_documents(documents)
        nodes = list(local_vector_store_index.docstore.docs.values())
        local_vector_store_data: SimpleVectorStoreData = local_vector_store_index.vector_store._data
        for node in nodes:
            node.embedding = local_vector_store_data.embedding_dict[node.node_id]

        ################## This is an Azure Vector Store 
        VectorStoreIndex(nodes, storage_context=azure_storage_context)
        logger.info(f"Your index {self.index_name} should now be populated at {self.service_endpoint} (go check)")

9 comments

LLogan M

you could just use the embedding model directly

Plain Text

nodes = ...
embeddings = embed_model.get_text_embeddings([n.get_content(metadata_mode="embed") for n in nodes])

assert len(embeddings) == len(nodes)

for node, embedding in zip(nodes, embeddings):
  node.embedding = embedding

sskittythecat

I would think that using VectorStoreIndex.from_documents(documents) has some advantages around batching and retrying, does it not?

sskittythecat

Answering my own question here, but the embed model actually seems pretty self-contained regarding robustness of calls. Note that Azure OpenAI has a default batch size of 10, which is hot garbage.

Edit: maybe that's global. Whatever the reason is I end up with a lot of "too many requests retries"

sskittythecat

I've ready things from last year that say the maximum openai and azure openai api batch sizes for embedding were 10, and then 16, and now they seem to be 2048, for openai at least. https://community.openai.com/t/embeddings-api-max-batch-size/655329/2

Edit: or maybe that's only for model="text-embedding-3-large",

Sorry, learning as I go here

sskittythecat

Ok, I confirmed that the limit for ada-002 on Azure OpenAI is indeed 2048.

Plain Text

import time
def test_max_batch_size(embed_model: llama_index_AzureOpenAIEmbedding, batch_sizes):
    test_text = "This is a test."  # Sample text to duplicate for the batch.
    max_supported_batch_size = None

    for batch_size in batch_sizes:
        embed_model.embed_batch_size = batch_size  # Set the batch size for the model.
        try:
            texts = [test_text] * batch_size  # Create a batch of duplicated texts.
            response = embed_model.get_text_embedding_batch(texts)  # Send the batch to the model for embedding.
            # If the request is successful, record this batch size as currently the largest successful one.
            assert len(response) == batch_size
            for r in response:
                assert len(r) == 1536
            print(f"Batch size of {batch_size} succeeded.")
            max_supported_batch_size = batch_size
            time.sleep(20)  # Add a delay to avoid rate limiting.
        except Exception as e:
            # Handle specific exceptions or failures based on the API's error responses.
            print(f"Batch size of {batch_size} failed with error: {e}")
            break  # Exit the loop on the first failure.

    if max_supported_batch_size:
        print(f"Maximum supported batch size is {max_supported_batch_size}")
    else:
        print("Unable to determine the maximum supported batch size, all tested sizes failed.")

Batch size of 2048 succeeded.
Batch size of 2049 failed with error: The batch size should not be larger than 2048.
Maximum supported batch size is 2048

Edit: it seems that the 2048 limit is imposed in code somewhere, likely in the openai SDK, because the API call for 2049 was not actually made.

LLogan M

yes, 2048 is the absolute limit through the openai SDK as far as I know

sskittythecat

I tested with a 2000 token text and found that I start hitting rate limits earlier than 2048 texts. It did manage to batch 100 of these though

sskittythecat

Even when the text is "This is a test. " * 80 (OpenAI's online tool says this is 400 tokens ) I can still do a batch of 2048.

LLogan M

yea rate limiting is fun. Theres both requests-per-minute limiting and tokens-per-minute limiting. And I know azure is more strict than plain openai

Add a reply

Find answers from the community

Is there a good way to finish creating