Is there a difference between summary

At a glance

The community members are discussing the difference between a "summary index" and a "document summary index". One community member explains that a summary index sends all nodes to the language model to refine an answer, while a document summary index generates a summary for each document and uses those summaries to select which document(s) to use for the query.

The community members are also discussing an issue with the "DocumentSummaryIndexLLMRetriever" where it is throwing an "IndexError" when trying to retrieve a summary. Another community member suggests that the language model is having trouble selecting the appropriate summary, and recommends using an embedding retriever instead.

UUzair

Is there a difference between summary index and document summary index? Can anyone help me better understand these two

9 comments

UUzair

@kapa.ai

kkapa.ai

Let me check the knowledge base for you. One moment! :books:

kkapa.ai

Sorry @Uzair, I seem to be having a bit of trouble. Don't worry though, I've pinged my support and they will try to fix me up! :technologist:

Perhaps try again in 15mins?

LLogan M

Summary index sends all nodes to the llm to refine an answer

Document summary index generates a summary for each document, and uses that summary to pick which document(s) to use for the query

UUzair

Thank you, I am having another problem.

UUzair

I made DocumentSummaryIndexLLMRetriever as shown

Plain Text

retriever = DocumentSummaryIndexLLMRetriever(
    index=doc_summary_index,
    choice_batch_size=10,
    choice_top_k=2,
)

retrieved_nodes = retriever.retrieve("Dé un resumen de los tratamientos para la fibrosis quística")
print(len(retrieved_nodes))

UUzair

but it is showing error

Plain Text

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-35-3639e5dd6690> in <cell line: 1>()
----> 1 retrieved_nodes = retriever.retrieve("Dé un resumen de los tratamientos para la fibrosis quística")
      2 print(len(retrieved_nodes))

3 frames
/usr/local/lib/python3.10/dist-packages/llama_index/core/instrumentation/dispatcher.py in wrapper(func, instance, args, kwargs)
    272             )
    273             try:
--> 274                 result = func(*args, **kwargs)
    275             except BaseException as e:
    276                 self.event(SpanDropEvent(span_id=id_, err_str=str(e)))

/usr/local/lib/python3.10/dist-packages/llama_index/core/base/base_retriever.py in retrieve(self, str_or_query_bundle)
    242                 payload={EventPayload.QUERY_STR: query_bundle.query_str},
    243             ) as retrieve_event:
--> 244                 nodes = self._retrieve(query_bundle)
    245                 nodes = self._handle_recursive_retrieval(query_bundle, nodes)
    246                 retrieve_event.on_end(

/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/document_summary/retrievers.py in _retrieve(self, query_bundle)
     96                 query_str=query_str,
     97             )
---> 98             raw_choices, relevances = self._parse_choice_select_answer_fn(
     99                 raw_response, len(summary_nodes)
    100             )

/usr/local/lib/python3.10/dist-packages/llama_index/core/indices/utils.py in default_parse_choice_select_answer_fn(answer, num_choices, raise_error)
    102                     "answer_num: <int>, answer_relevance: <float>"
    103                 )
--> 104         answer_num = int(line_tokens[0].split(":")[1].strip())
    105         if answer_num > num_choices:
    106             continue

IndexError: list index out of range

UUzair

Can you help me with this error? Why is it occuring and how to fix it?

LLogan M

The LLM is selecting which summary to use, and failing to pick in a way that llama-index can parse

The LLM retriever is a tad flakey this way, tbh I would just use the embedding retriever

Add a reply

Find answers from the community

Is there a difference between summary