Not sure how to resolve this error `The

At a glance

A community member is encountering an error during indexing using GPTSimpleVectorIndex and SimpleDirectoryReader, where the batch size should not be larger than 2048. The community members discuss potential causes, such as encoding issues with Cyrillic text in the documents. The error is confirmed to be a full error, not a warning, and a full traceback is provided. Another community member mentions that a related PR has just been landed, and suggests updating the llama-index library to see if the problem still persists.

Useful resources

bbbornsztein

Not sure how to resolve this error: The batch size should not be larger than 2048

6 comments

bbbornsztein

I'm getting this during indexing using GPTSimpleVectorIndex and SimpleDirectoryReader (it's just indexing some .txt files.

Sounds like maybe there's a string in one of the files that's not getting split up correctly?

@jerryjliu0 any ideas? I believe there's some Cyrillic text in the docs; could this have something to do with encoding issues?

jjerryjliu0

hmm yeah @bbornsztein is this a warning e.g. code still completes or a full error?

bbbornsztein

It's an error. Here's the full trace

Plain Text

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/openai.py", line 142, in get_embeddings
    assert len(list_of_text) <= 2048, "The batch size should not be larger than 2048."
AssertionError: The batch size should not be larger than 2048.

bbbornsztein

Plain Text

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/llama_index/token_counter/token_counter.py", line 86, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/base.py", line 304, in insert
    self._insert(processed_doc, **insert_kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 211, in _insert
    self._add_document_to_index(self._index_struct, document)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 182, in _add_document_to_index
    embedding_results = self._get_node_embedding_results(
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 102, in _get_node_embedding_results
    result_ids, result_embeddings = self._embed_model.get_queued_text_embeddings()
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/base.py", line 151, in get_queued_text_embeddings
    embeddings = self._get_text_embeddings(cur_batch_texts)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/openai.py", line 260, in _get_text_embeddings
    embeddings = get_embeddings(texts, engine=engine)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f65effc1160 state=finished raised AssertionError>]

jjerryjliu0

got it, thanks. this PR was just landed (i'm actually baffled that I didnt' catch this before) https://github.com/jerryjliu/llama_index/pull/851

jjerryjliu0

could you update llama-index and see if problem still persists?

Add a reply

Find answers from the community

Not sure how to resolve this error `The