Find answers from the community

Updated 2 years ago

Not sure how to resolve this error `The

Not sure how to resolve this error: The batch size should not be larger than 2048
b
j
6 comments
I'm getting this during indexing using GPTSimpleVectorIndex and SimpleDirectoryReader (it's just indexing some .txt files.

Sounds like maybe there's a string in one of the files that's not getting split up correctly?

@jerryjliu0 any ideas? I believe there's some Cyrillic text in the docs; could this have something to do with encoding issues?
hmm yeah @bbornsztein is this a warning e.g. code still completes or a full error?
It's an error. Here's the full trace

Plain Text
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/openai.py", line 142, in get_embeddings
    assert len(list_of_text) <= 2048, "The batch size should not be larger than 2048."
AssertionError: The batch size should not be larger than 2048.
Plain Text
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/llama_index/token_counter/token_counter.py", line 86, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/base.py", line 304, in insert
    self._insert(processed_doc, **insert_kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 211, in _insert
    self._add_document_to_index(self._index_struct, document)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 182, in _add_document_to_index
    embedding_results = self._get_node_embedding_results(
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 102, in _get_node_embedding_results
    result_ids, result_embeddings = self._embed_model.get_queued_text_embeddings()
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/base.py", line 151, in get_queued_text_embeddings
    embeddings = self._get_text_embeddings(cur_batch_texts)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/openai.py", line 260, in _get_text_embeddings
    embeddings = get_embeddings(texts, engine=engine)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f65effc1160 state=finished raised AssertionError>]
got it, thanks. this PR was just landed (i'm actually baffled that I didnt' catch this before) https://github.com/jerryjliu/llama_index/pull/851
could you update llama-index and see if problem still persists?
Add a reply
Sign up and join the conversation on Discord