vkdi5cord

Does the PDF reader use OCR

Does the PDF reader use OCR?

7 comments

When I run list_index = GPTListIndex([index1, index2, index3]), I get the following error: e 197, in _get_nodes_from_document
text_chunks = text_splitter.split_text(document.get_text())
File "/Users/a/opt/anaconda3/lib/python3.9/site-packages/gpt_index/langchain_helpers/text_splitter.py", line 97, in split_text
splits = text.split(self._separator)
AttributeError: 'Response' object has no attribute 'split'

30 comments

vvkdi5cord

I m getting this error ERROR IndexError

I'm getting this error"[ERROR] IndexError: list index out of range
Traceback (most recent call last):
  File "/var/task/app.py", line 85, in handler
    index = GPTSimpleVectorIndex(documents, chunk_size_limit=256)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/simple.py", line 48, in init
    super().init(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/base.py", line 43, in init
    super().init(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 96, in init
    self._index_struct = self.build_index_from_documents(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/token_counter/token_counter.py", line 54, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 231, in build_index_from_documents
    return self._build_index_from_documents(documents, verbose=verbose)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/base.py", line 74, in _build_index_from_documents
    self._add_document_to_index(index_struct, d, text_splitter)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/simple.py", line 64, in _add_document_to_index
    nodes = self._get_nodes_from_document(document, text_splitter)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 197, in _get_nodes_from_document
    text_chunks = text_splitter.split_text(document.get_text())
  File "/var/lang/lib/python3.8/site-packages/gpt_index/langchain_helpers/text_splitter.py", line 128, in split_text
    cur_num_tokens = max(len(self.tokenizer(splits[start_idx])), 1)
" when parsing large PDF datasheets with small chunk sizes

7 comments

vvkdi5cord

Is this something to worry about Token

Is this something to worry about "Token indices sequence length is longer than the specified maximum sequence length for this model"

16 comments

vvkdi5cord

Economy of the United States

I'm trying to parse this article: https://en.wikipedia.org/wiki/Economy_of_the_United_States#Mergers_and_acquisitions" and in the section in the attached screenshot it has some info about the 2017 GDP per capita in the US. My query is for the GDP per capita in 2022, but it unfortunately returns the value for the 2017 GDP mistakenly as the 2022 GDP value.

13 comments

Find answers from the community

Does the PDF reader use OCR

Composability

I m getting this error ERROR IndexError

Is this something to worry about Token

Economy of the United States