Find answers from the community

Updated 2 months ago

Is there a good way to finish creating

Is there a good way to finish creating embeddings before inserting nodes to a remote index? I'm doing this by building an in-memory index and then accessing the SimpleVectorStoreData property to create nodes with embeddings, and using that to build my VectorStoreIndex over an Azure Cognitive Search Storage Context.

Plain Text
        logger.info("Indexing may take a while.  Please wait...")
        # First create an in memory index to get embeddings for all nodes.
        local_vector_store_index = VectorStoreIndex.from_documents(documents)
        nodes = list(local_vector_store_index.docstore.docs.values())
        local_vector_store_data: SimpleVectorStoreData = local_vector_store_index.vector_store._data
        for node in nodes:
            node.embedding = local_vector_store_data.embedding_dict[node.node_id]

        ################## This is an Azure Vector Store 
        VectorStoreIndex(nodes, storage_context=azure_storage_context)
        logger.info(f"Your index {self.index_name} should now be populated at {self.service_endpoint} (go check)")
L
s
9 comments
you could just use the embedding model directly

Plain Text
nodes = ...
embeddings = embed_model.get_text_embeddings([n.get_content(metadata_mode="embed") for n in nodes])

assert len(embeddings) == len(nodes)

for node, embedding in zip(nodes, embeddings):
  node.embedding = embedding
I would think that using VectorStoreIndex.from_documents(documents) has some advantages around batching and retrying, does it not?
Answering my own question here, but the embed model actually seems pretty self-contained regarding robustness of calls. Note that Azure OpenAI has a default batch size of 10, which is hot garbage.

Edit: maybe that's global. Whatever the reason is I end up with a lot of "too many requests retries"
I've ready things from last year that say the maximum openai and azure openai api batch sizes for embedding were 10, and then 16, and now they seem to be 2048, for openai at least. https://community.openai.com/t/embeddings-api-max-batch-size/655329/2

Edit: or maybe that's only for model="text-embedding-3-large",

Sorry, learning as I go here
Ok, I confirmed that the limit for ada-002 on Azure OpenAI is indeed 2048.

Plain Text
import time
def test_max_batch_size(embed_model: llama_index_AzureOpenAIEmbedding, batch_sizes):
    test_text = "This is a test."  # Sample text to duplicate for the batch.
    max_supported_batch_size = None

    for batch_size in batch_sizes:
        embed_model.embed_batch_size = batch_size  # Set the batch size for the model.
        try:
            texts = [test_text] * batch_size  # Create a batch of duplicated texts.
            response = embed_model.get_text_embedding_batch(texts)  # Send the batch to the model for embedding.
            # If the request is successful, record this batch size as currently the largest successful one.
            assert len(response) == batch_size
            for r in response:
                assert len(r) == 1536
            print(f"Batch size of {batch_size} succeeded.")
            max_supported_batch_size = batch_size
            time.sleep(20)  # Add a delay to avoid rate limiting.
        except Exception as e:
            # Handle specific exceptions or failures based on the API's error responses.
            print(f"Batch size of {batch_size} failed with error: {e}")
            break  # Exit the loop on the first failure.

    if max_supported_batch_size:
        print(f"Maximum supported batch size is {max_supported_batch_size}")
    else:
        print("Unable to determine the maximum supported batch size, all tested sizes failed.")


Batch size of 2048 succeeded.
Batch size of 2049 failed with error: The batch size should not be larger than 2048.
Maximum supported batch size is 2048

Edit: it seems that the 2048 limit is imposed in code somewhere, likely in the openai SDK, because the API call for 2049 was not actually made.
yes, 2048 is the absolute limit through the openai SDK as far as I know
I tested with a 2000 token text and found that I start hitting rate limits earlier than 2048 texts. It did manage to batch 100 of these though
Even when the text is "This is a test. " * 80 (OpenAI's online tool says this is 400 tokens ) I can still do a batch of 2048.
yea rate limiting is fun. Theres both requests-per-minute limiting and tokens-per-minute limiting. And I know azure is more strict than plain openai
Add a reply
Sign up and join the conversation on Discord