Find answers from the community

Updated 2 years ago

```AttributeError Traceback most recent

At a glance

The post describes an issue with an AttributeError that occurred when trying to build a GPTListIndex from three indices: index1, index2, and index3. The error occurred in the __init__ method of the GPTListIndex class.

In the comments, community members discuss possible causes of the issue. They note that index1 is loaded directly from a document, while index2 and index3 are vector indices built using the BeautifulSoupWebReader. One community member suggests trying to set the text of each sub-index to a dummy string, but the original poster confirms that the text is already set for all the indices.

The community members speculate that the issue might be related to how the BeautifulSoupWebReader is parsing the document, and suggest trying to print the documents from BeautifulSoup before constructing the index to investigate further.

Plain Text
AttributeError                            Traceback (most recent call last)
<ipython-input-25-2b2489375d2b> in <module>
     11 "CeFi, People to Watch, Crypto Policy, Ethereum, L1, L2, DAOs and Web3.")
     12 
---> 13 list_index = GPTListIndex([index1, index2, index3])
     14 graph = ComposableGraph.build_from_index(list_index)
     15 

8 frames
/usr/local/lib/python3.9/dist-packages/llama_index/indices/list/base.py in __init__(self, documents, index_struct, text_qa_template, llm_predictor, text_splitter, **kwargs)
     55         """Initialize params."""
     56         self.text_qa_template = text_qa_template or DEFAULT_TEXT_QA_PROMPT
---> 57         super().__init__(
     58             documents=documents,
     59             index_struct=index_struct,

/usr/local/lib/python3.9/dist-packages/llama_index/indices/base.py in __init__(self, documents, index_struct, llm_predictor, embed_model, docstore, index_registry, prompt_helper, text_splitter, chunk_size_limit, include_extra_info, llama_logger)
    112             self._validate_documents(documents)
    113             # TODO: introduce document store outside __init__ function
--> 114             self._index_struct = self.build_index_from_documents(documents)
    115         # update index registry and docstore with index_struct
    116         self._update_index_registry_and_docstore()

/usr/local/lib/python3.9/dist-packages/llama_index/token_counter/token_counter.py in wrapped_llm_predict(_self, *args, **kwargs)
     84         def wrapped_llm_predict(_self: Any, *args: Any, **kwargs: Any) -> Any:
     85             with wrapper_logic(_self):
---> 86                 f_return_val = f(_self, *args, **kwargs)
     87 
     88             return f_return_val
D
L
9 comments
Plain Text
/usr/local/lib/python3.9/dist-packages/llama_index/indices/base.py in build_index_from_documents(self, documents)
    284     def build_index_from_documents(self, documents: Sequence[BaseDocument]) -> IS:
    285         """Build the index from documents."""
--> 286         return self._build_index_from_documents(documents)
    287 
    288     @abstractmethod

/usr/local/lib/python3.9/dist-packages/llama_index/indices/list/base.py in _build_index_from_documents(self, documents)
     90         index_struct = IndexList()
     91         for d in documents:
---> 92             nodes = self._get_nodes_from_document(d)
     93             for n in nodes:
     94                 index_struct.add_node(n)

/usr/local/lib/python3.9/dist-packages/llama_index/indices/base.py in _get_nodes_from_document(self, document, start_idx)
    266         start_idx: int = 0,
    267     ) -> List[Node]:
--> 268         return get_nodes_from_document(
    269             document=document,
    270             text_splitter=self._text_splitter,

/usr/local/lib/python3.9/dist-packages/llama_index/indices/node_utils.py in get_nodes_from_document(document, text_splitter, start_idx, include_extra_info)
     48 ) -> List[Node]:
     49     """Add document to index."""
---> 50     text_splits = get_text_splits_from_document(
     51         document=document,
     52         text_splitter=text_splitter,

/usr/local/lib/python3.9/dist-packages/llama_index/indices/node_utils.py in get_text_splits_from_document(document, text_splitter, include_extra_info)
     28     if isinstance(text_splitter, TokenTextSplitter):
     29         # use this to extract extra information about the chunks
---> 30         text_splits = text_splitter.split_text_with_overlaps(
     31             document.get_text(),
     32             extra_info_str=document.extra_info_str if include_extra_info else None,
Plain Text
/usr/local/lib/python3.9/dist-packages/llama_index/langchain_helpers/text_splitter.py in split_text_with_overlaps(self, text, extra_info_str)
    141 
    142         # First we naively split the large input into a bunch of smaller ones.
--> 143         splits = text.split(self._separator)
    144         splits = self._preprocess_splits(splits, effective_chunk_size)
    145         # We now want to combine these smaller pieces into medium size

AttributeError: 'dict' object has no attribute 'split'
From "index1, index2 and index3", 2 and 3 are vector indices built with BeautifulSoupWebReader. Index1 is loaded right from a document. If I repeat index1 n times, the list index is built correctly.
Huh, that's a little weird πŸ€” maybe try setting the text of each sub index to a dummy string?

index_1.set_text("")

but for each index.
all of them have text already set. You mean cleaning this parameter from the 2 and 3 indexes?
I think this is somehow related to how this beautifulsoup data loader is parsing the doc
Yea that might be it too. Kinda weird how a dict object got there though πŸ€”
Could try printing the documents from beautifulsoup before constructing the index
yeah, this is a good test, I'll check it
Add a reply
Sign up and join the conversation on Discord