Find answers from the community

Updated 2 years ago

summarize fail

At a glance

The post shows code for summarizing an article using the LlamaIndex library. The community members encountered an issue where the summarization failed due to an IndexError. They tried various approaches, such as reading the file differently and using a ListIndex instead of a DocumentSummaryIndex. Eventually, one community member found a solution by using the get_document_summary method instead of the query engine. There is no explicitly marked answer, but the community members worked together to troubleshoot and find a solution.

Plain Text
llm_predictor_chatgpt = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo"))
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor_chatgpt, chunk_size=1024)
# get the summary prompt
summory_prompt = ""
with open("summary_prompt.txt", "r") as f:
    summary_prompt = f.read()
summary_query = summory_prompt
print(f"Summary Query length {len(summary_query)}")
text = f"{post.title}\n{post.subtitle}\n{post.content}"
document = Document(text, article.url)
document_summary_index = DocumentSummaryIndex.from_documents([document], service_context=service_context)
index = document_summary_index.as_query_engine()
summary = index.query(summary_query)
print(f"Summary: {summary}")
Z
W
L
7 comments
Plain Text
File "/Users/zachhandley/Documents/GitHub/AI-ChannelBot/article_summarizer.py", line 28, in summarize_article
    summary = index.query(summary_query)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/indices/query/base.py", line 23, in query
    response = self._query(str_or_query_bundle)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/query_engine/retriever_query_engine.py", line 142, in _query
    nodes = self._retriever.retrieve(query_bundle)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/indices/base_retriever.py", line 21, in retrieve
    return self._retrieve(str_or_query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/indices/document_summary/retrievers.py", line 80, in _retrieve
    raw_choices, relevances = self._parse_choice_select_answer_fn(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/llama_index/indices/utils.py", line 100, in default_parse_choice_select_answer_fn
    answer_num = int(line_tokens[0].split(":")[1].strip())
                     ~~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
Could you try reading the file like this
Plain Text
# open a file
file1 = open("data.txt", "r", encoding="utf-8")

# read the file
read_content = file1.read()
This seems like the LLM isn't following instructions (it's failing when "selecting"). Seems like a small bug when the document summary index only has one document, it shouldn't need to "select" a document when there is only one

Although if you just want the summary of a file, I would use a ListIndex with response_mode="tree_summarize"

Plain Text
from llama_index import ListIndex
index = ListIndex.from_documents([document])
query_engine = index.as_query_engine(response_mode="tree_summarize")
response = index.query("Summarize this document")
gotcha noted, quick question, what would you recommend to use if I want to select 10 times from a list of things? Is an agent the right thing to use?
I did end up getting it to work
I had to change the way I was asking it for a summary rather then using the document summary index as a query engine instead I used get_document_summary and that worked really well
Add a reply
Sign up and join the conversation on Discord