Hi, I&#x27;m having an issue when trying to

What is the error?

Plain Text

Traceback (most recent call last):
  File "/python3.11/site-packages/pydantic/v1/main.py", line 522, in parse_obj
    obj = dict(obj)
          ^^^^^^^^^
ValueError: dictionary update sequence element #0 has length 1; 2 is required

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 130, in <module>
  File "<string>", line 116, in python_function
  File "/python3.11/site-packages/llama_index/core/base_query_engine.py", line 40, in query
    return self._query(str_or_query_bundle)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/llama_index/query_engine/sub_question_query_engine.py", line 129, in _query
    sub_questions = self._question_gen.generate(self._metadatas, query_bundle)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/llama_index/question_gen/llm_generators.py", line 78, in generate
    parse = self._prompt.output_parser.parse(prediction)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/llama_index/question_gen/output_parser.py", line 15, in parse
    sub_questions = [SubQuestion.parse_obj(item) for item in json_dict]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/llama_index/question_gen/output_parser.py", line 15, in <listcomp>
    sub_questions = [SubQuestion.parse_obj(item) for item in json_dict]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/python3.11/site-packages/pydantic/v1/main.py", line 525, in parse_obj
    raise ValidationError([ErrorWrapper(exc, loc=ROOT_KEY)], cls) from e
pydantic.v1.error_wrappers.ValidationError: 1 validation error for SubQuestion
__root__
  SubQuestion expected dict not str (type=type_error)

Intersting, the LLM produces the wrong output for us to parse

In any case, since you are using openai, best to use the OpenAIQuestionGenerator since it uses the function calling api

strange / interesting part is that this is always happening when using a vector index -> with a summary index it will always work

the LLM question generator just prompts the LLM to generate a json that we can parse, using the query + tool name/description as inputs

The OpenAIQuestionGenerator is 100x more reliable

yes, it is then working - just one other thing that I just saw: SubQuestionQueryEngine -> doc say's it is using default rue, but the constructor sets the default to false

Sorry, default rue ? Not sure what that refers to

Hi, I am sorry to interrupt your conversation. But I am facing the same issue when running the notebook https://docs.llamaindex.ai/en/stable/examples/metadata_extraction/MetadataExtractionSEC.html. I didn't modify any code. I am sorry I am new to LlamaIndex. I am a bit confused how I can change the code. I believe I need to change some part of the code:

Plain Text

final_engine_no_metadata = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[
        QueryEngineTool(
            query_engine=engine_no_metadata,
            metadata=ToolMetadata(
                name="sec_filing_documents",
                description="financial information on companies",
            ),
        )
    ],
    question_gen=question_gen,
    use_async=True,
)

But I don't know how

Instead of using LLMQuestonGenerator, use OpenAIQuestionGenerator

Plain Text

from llama_index.question_gen.llm_generators import OpenAIQuestionGenerator, DEFAULT_OPENAI_SUB_QUESTION_PROMPT_TMPL

service_context = ServiceContext.from_defaults(
    llm=llm, text_splitter=text_splitter
)
question_gen = OpenAIQuestionGenerator.from_defaults(
    llm=service_context.llm,
    prompt_template_str="""
        Follow the example, but instead of giving a question, always prefix the question 
        with: 'By first identifying and quoting the most relevant sources, '. 
        """
    + DEFAULT_OPENAI_SUB_QUESTION_PROMPT_TMPL,
)

Thank you. But I am sorry now I am facing another issue:

Plain Text

ImportError                               Traceback (most recent call last)
<ipython-input-14-24df7212ac1e> in <cell line: 16>()
     14 # )
     15 
---> 16 from llama_index.question_gen.llm_generators import OpenAIQuestionGenerator, DEFAULT_OPENAI_SUB_QUESTION_PROMPT_TMPL
     17 
     18 service_context = ServiceContext.from_defaults(

ImportError: cannot import name 'OpenAIQuestionGenerator' from 'llama_index.question_gen.llm_generators' (/usr/local/lib/python3.10/dist-packages/llama_index/question_gen/llm_generators.py)

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

There is a typo.

Plain Text

from llama_index.question_gen.openai_generator import OpenAIQuestionGenerator, DEFAULT_OPENAI_SUB_QUESTION_PROMPT_TMPL

Now it works, thank you!

ah thanks for the fix!

@Logan M sry, forgot to mention which field: https://github.com/run-llama/llama_index/blob/9de9bf1af1f5ea4a7ab76f4dc674f4cc05e53fa6/llama_index/query_engine/sub_question_query_engine.py#L62 the use_async is inconsistent with it's default

default is false, doc says true, and from_defaults sets it to true

ah I see. Thats confusing lol

it should be true there

thanks for pointing that out

A follow-up on this issue. I noticed OpenAIQuestionGenerator improved the performance.
response_no_metadata with LLMQuestionGenerator from the original notebook:

Plain Text

Generated 4 sub questions.
[sec_filing_documents] Q: What was the cost due to research and development for Uber in 2019
[sec_filing_documents] Q: What was the cost due to sales and marketing for Uber in 2019
[sec_filing_documents] Q: What was the cost due to research and development for Lyft in 2019
[sec_filing_documents] Q: What was the cost due to sales and marketing for Lyft in 2019
[sec_filing_documents] A: The cost due to sales and marketing for Uber in 2019 was $814,122 in thousands.
[sec_filing_documents] A: The cost due to research and development for Uber in 2019 was $1,505,640 in thousands.
[sec_filing_documents] A: The cost of research and development for Lyft in 2019 was $1,505,640 in thousands.
[sec_filing_documents] A: The cost due to sales and marketing for Lyft in 2019 was $814,122 in thousands.
{
  "Uber": {
    "Research and Development": 1505.64,
    "Sales and Marketing": 814.122
  },
  "Lyft": {
    "Research and Development": 1505.64,
    "Sales and Marketing": 814.122
  }
}