Find answers from the community

Updated 2 years ago

Not sure how to resolve this error `The

At a glance

A community member is encountering an error during indexing using GPTSimpleVectorIndex and SimpleDirectoryReader, where the batch size should not be larger than 2048. The community members discuss potential causes, such as encoding issues with Cyrillic text in the documents. The error is confirmed to be a full error, not a warning, and a full traceback is provided. Another community member mentions that a related PR has just been landed, and suggests updating the llama-index library to see if the problem still persists.

Useful resources
Not sure how to resolve this error: The batch size should not be larger than 2048
b
j
6 comments
I'm getting this during indexing using GPTSimpleVectorIndex and SimpleDirectoryReader (it's just indexing some .txt files.

Sounds like maybe there's a string in one of the files that's not getting split up correctly?

@jerryjliu0 any ideas? I believe there's some Cyrillic text in the docs; could this have something to do with encoding issues?
hmm yeah @bbornsztein is this a warning e.g. code still completes or a full error?
It's an error. Here's the full trace

Plain Text
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/openai.py", line 142, in get_embeddings
    assert len(list_of_text) <= 2048, "The batch size should not be larger than 2048."
AssertionError: The batch size should not be larger than 2048.
Plain Text
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/llama_index/token_counter/token_counter.py", line 86, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/base.py", line 304, in insert
    self._insert(processed_doc, **insert_kwargs)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 211, in _insert
    self._add_document_to_index(self._index_struct, document)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 182, in _add_document_to_index
    embedding_results = self._get_node_embedding_results(
  File "/usr/local/lib/python3.9/dist-packages/llama_index/indices/vector_store/base.py", line 102, in _get_node_embedding_results
    result_ids, result_embeddings = self._embed_model.get_queued_text_embeddings()
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/base.py", line 151, in get_queued_text_embeddings
    embeddings = self._get_text_embeddings(cur_batch_texts)
  File "/usr/local/lib/python3.9/dist-packages/llama_index/embeddings/openai.py", line 260, in _get_text_embeddings
    embeddings = get_embeddings(texts, engine=engine)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
  File "/usr/local/lib/python3.9/dist-packages/tenacity/__init__.py", line 326, in iter
    raise retry_exc from fut.exception()
tenacity.RetryError: RetryError[<Future at 0x7f65effc1160 state=finished raised AssertionError>]
got it, thanks. this PR was just landed (i'm actually baffled that I didnt' catch this before) https://github.com/jerryjliu/llama_index/pull/851
could you update llama-index and see if problem still persists?
Add a reply
Sign up and join the conversation on Discord