Find answers from the community

Updated 9 months ago

Retriver

Hi,I am usring DocumentSummaryIndexRetriever to sumary pdf file.
pdf size: 64.5M
embedding: FastEmbedEmbedding
splitter: SemanticSplitterNodeParser
Plain Text
Settings.embed_model = FastEmbedEmbedding()
splitter = SemanticSplitterNodeParser(embed_model= Settings.embed_model)

# default mode of building the index
response_synthesizer = get_response_synthesizer(
    response_mode="tree_summarize", use_async=True
)
doc_summary_index = DocumentSummaryIndex.from_documents(
    docs,
    transformations=[splitter],
    response_synthesizer=response_synthesizer,
    show_progress=True,
)
retriever = DocumentSummaryIndexRetriever(
    doc_summary_index,
)

response_synthesizer = get_response_synthesizer(response_mode="tree_summarize")
retriever_query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

output:
Plain Text
File document_summary\retrievers.py:98, in DocumentSummaryIndexLLMRetriever._retrieve(self, query_bundle)
      raw_response = self._llm.predict(
          self._choice_select_prompt,
         context_str=fmt_batch_str,
          query_str=query_str,
     97 )
---> 98 raw_choices, relevances = self._parse_choice_select_answer_fn(
     99     raw_response, len(summary_nodes)
    1 )
     choice_idxs = [choice - 1 for choice in raw_choices]
     choice_summary_ids = [summary_ids_batch[ci] for ci in choice_idxs]

File llama_index\core\indices\utils.py:104, in default_parse_choice_select_answer_fn(answer, num_choices, raise_error)
          else:
              raise ValueError(
                 f"Invalid answer line: {answer_line}. "
                "Answer line must be of the form: "
    102             "answer_num: <int>, answer_relevance: <float>"
    103         )
--> 104 answer_num = int(line_tokens[0].split(":")[1].strip())
    105 if answer_num > num_choices:
        continue

IndexError: list index out of range

Can anyone help me
L
n
5 comments
This is an issue with using the LLM as a selector. Sometimes the LLM does not produce a valid json to parse

What llm are you using? I would use the document summary embeddi retriever instead
I use gpt-3.5-turbo
ok. I will try to use the document summary embeddi retriever .
Is it possible that there is an issue with some page text information in the PDF file?
I have checked the source code of llama index and did not do any preprocessing.
nah, its just an issue with the LLM not outputing json. The error is in parsing some LLM output
Add a reply
Sign up and join the conversation on Discord