Find answers from the community

Home
Members
vkdi5cord
v
vkdi5cord
Offline, last seen 3 months ago
Joined September 25, 2024
Does the PDF reader use OCR?
7 comments
j
r
S
When I run list_index = GPTListIndex([index1, index2, index3]), I get the following error: e 197, in _get_nodes_from_document
text_chunks = text_splitter.split_text(document.get_text())
File "/Users/a/opt/anaconda3/lib/python3.9/site-packages/gpt_index/langchain_helpers/text_splitter.py", line 97, in split_text
splits = text.split(self._separator)
AttributeError: 'Response' object has no attribute 'split'
30 comments
v
j
f
a
I'm getting this error"[ERROR] IndexError: list index out of range
Traceback (most recent call last):
  File "/var/task/app.py", line 85, in handler
    index = GPTSimpleVectorIndex(documents, chunk_size_limit=256)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/simple.py", line 48, in init
    super().init(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/base.py", line 43, in init
    super().init(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 96, in init
    self._index_struct = self.build_index_from_documents(
  File "/var/lang/lib/python3.8/site-packages/gpt_index/token_counter/token_counter.py", line 54, in wrapped_llm_predict
    f_return_val = f(_self, *args, **kwargs)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 231, in build_index_from_documents
    return self._build_index_from_documents(documents, verbose=verbose)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/base.py", line 74, in _build_index_from_documents
    self._add_document_to_index(index_struct, d, text_splitter)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/vector_store/simple.py", line 64, in _add_document_to_index
    nodes = self._get_nodes_from_document(document, text_splitter)
  File "/var/lang/lib/python3.8/site-packages/gpt_index/indices/base.py", line 197, in _get_nodes_from_document
    text_chunks = text_splitter.split_text(document.get_text())
  File "/var/lang/lib/python3.8/site-packages/gpt_index/langchain_helpers/text_splitter.py", line 128, in split_text
    cur_num_tokens = max(len(self.tokenizer(splits[start_idx])), 1)
" when parsing large PDF datasheets with small chunk sizes
7 comments
v
j
Is this something to worry about "Token indices sequence length is longer than the specified maximum sequence length for this model"
16 comments
v
y
j
I'm trying to parse this article: https://en.wikipedia.org/wiki/Economy_of_the_United_States#Mergers_and_acquisitions" and in the section in the attached screenshot it has some info about the 2017 GDP per capita in the US. My query is for the GDP per capita in 2022, but it unfortunately returns the value for the 2017 GDP mistakenly as the 2022 GDP value.
13 comments
v
j
r