summarize fail

At a glance

The post shows code for summarizing an article using the LlamaIndex library. The community members encountered an issue where the summarization failed due to an IndexError. They tried various approaches, such as reading the file differently and using a ListIndex instead of a DocumentSummaryIndex. Eventually, one community member found a solution by using the get_document_summary method instead of the query engine. There is no explicitly marked answer, but the community members worked together to troubleshoot and find a solution.

ZZachHandley

Plain Text

llm_predictor_chatgpt = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt, chunk_size=1024)
# get the summary prompt
summory_prompt = ""
with open("summary_prompt.txt", "r") as f:
    summary_prompt = f.read()
summary_query = summory_prompt
print(f"Summary Query length {len(summary_query)}")
text = f"{post.title}\n{post.subtitle}\n{post.content}"
document = Document(text, article.url)
document_summary_index = DocumentSummaryIndex.from_documents([document], service_context=service_context)
index = document_summary_index.as_query_engine()
summary = index.query(summary_query)
print(f"Summary: {summary}")

7 comments

ZZachHandley

Plain Text

File "/Users/zachhandley/Documents/GitHub/AI-ChannelBot/article_summarizer.py", line 28, in summarize_article
    summary = index.query(summary_query)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/query_engine/retriever_query_engine.py", line 142, in _query
    nodes = self._retriever.retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/indices/base_retriever.py", line 21, in retrieve
    return self._retrieve(str_or_query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/indices/document_summary/retrievers.py", line 80, in _retrieve
    raw_choices, relevances = self._parse_choice_select_answer_fn(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/indices/utils.py", line 100, in default_parse_choice_select_answer_fn
    answer_num = int(line_tokens[0].split(":")[1].strip())
                     ~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

ZZachHandley

summarize fail

WWhiteFang_Jr

Could you try reading the file like this

Plain Text

# open a file
file1 = open("data.txt", "r", encoding="utf-8")

# read the file
read_content = file1.read()

LLogan M

This seems like the LLM isn't following instructions (it's failing when "selecting"). Seems like a small bug when the document summary index only has one document, it shouldn't need to "select" a document when there is only one

Although if you just want the summary of a file, I would use a ListIndex with response_mode="tree_summarize"

Plain Text

from llama_index import ListIndex
index = ListIndex.from_documents([document])
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = index.query("Summarize this document")

ZZachHandley

gotcha noted, quick question, what would you recommend to use if I want to select 10 times from a list of things? Is an agent the right thing to use?

ZZachHandley

I did end up getting it to work

ZZachHandley

I had to change the way I was asking it for a summary rather then using the document summary index as a query engine instead I used get_document_summary and that worked really well

Add a reply

Find answers from the community

summarize fail