Find answers from the community

Updated 2 years ago

I m getting this error ERROR IndexError

At a glance

The community member is experiencing an IndexError: list index out of range error when parsing large PDF datasheets with small chunk sizes using the GPTSimpleVectorIndex from the gpt_index library. The community members discuss the issue, with one noting that a pull request (https://github.com/jerryjliu/gpt_index/pull/306) should fix the problem, and that it will be patched into the next release. The community members also suggest that the user can pull the main branch to get the fix sooner.

Useful resources
I'm getting this error"[ERROR] IndexError: list index out of range
Traceback (most recent call last):
  File "/var/task/app.py", line 85, in handler
    index = GPTSimpleVectorIndex(documents, chunk_size_limit=256)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/simple.py", line 48, in init
    super().init(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/base.py", line 43, in init
    super().init(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 96, in init
    self._index_struct = self.build_index_from_documents(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/token_counter/token_counter.py", line 54, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 231, in build_index_from_documents
    return self._build_index_from_documents(documents, verbose=verbose)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/base.py", line 74, in _build_index_from_documents
    self._add_document_to_index(index_struct, d, text_splitter)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/simple.py", line 64, in _add_document_to_index
    nodes = self._get_nodes_from_document(document, text_splitter)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 197, in _get_nodes_from_document
    text_chunks = text_splitter.split_text(document.get_text())
  File "/var/lang/lib/python3.8/site-packages/gpt_index/langchain_helpers/text_splitter.py", line 128, in split_text
    cur_num_tokens = max(len(self.tokenizer(splits[start_idx])), 1)
" when parsing large PDF datasheets with small chunk sizes
j
v
7 comments
this looks like a new error
i'll take a look tonight, sorry about that
will patch it into tomorrow's release
(you can pull main for now if you wanti t sooner)
Add a reply
Sign up and join the conversation on Discord