The decomposable query seems reasonable. How about using a ReAct agent?
I've done something similar manually. I defined an arbitrary data model using SQLModel, then walked through the model to have the LLM auto-populate instances using a vector store index. It generates JSON schema-compliant results stored in Postgres, essentially acting as an LLM-based database generator. You provide the data model and input data, and it generates an SQL database. Pretty cool.
Since I don't know the fields at runtime, I inspect the model and use field descriptions to ask the LLM to generate questions based on the schema. I then have another LLM answer those questions with my context.
Here's the base prompt I'm using:
base_prompt = (
"Your ROLE is a senior research analyst on a team providing detailed information about businesses. "
"You will be given the business's name, a specific field name, and a JSON schema containing a description of that field "
"and all relevant sub-fields for comprehensive understanding.\n"
"Your TASK is to generate a thorough question that fully elucidates the field. Please adhere to the following guidelines: \n"
" - Include all subfields in a single, comprehensive question.\n"
" - Exclude unique identifier fields like 'id'.\n"
" - Use examples in the schema to craft precise and informative questions.\n"
" - Do not reference database terminology.\n"
"Emit only the question without any conversational elements or prefaces.\n"
)
I also have a QA agent check if the generated answer satisfies the question. If it does, the model is hydrated with the data. If not, the failed question/answer pair is stored, and the question generator tries again. There's even a backup strategy to reduce strictness and increase recall on failed attempts.