Hello I am trying to use the html tag style output formatting in the MultiModalLLMCompletionProgram since it is much more reliable than JSON output format, I can modify the output format prompt but even though I get expected results, it gives out an error saying that could not extract json string from output
format_string = '''
Format your response using the following HTML-like tags:
<people_count>[number]</people_count>
<background>[description]</background>
Do not include any other text or preamble.
'''
output_parser = PydanticOutputParser(
output_cls=FrameAnalysisOutput, pydantic_format_tmpl=format_string)
llm_program = MultiModalLLMCompletionProgram.from_defaults(
output_parser=output_parser,
prompt_template_str=prompt_template_str,
multi_modal_llm=mm_model,
image_documents=image_docs,
verbose=True,
)
Starting MultiModalLLMCompletionProgram...
> Raw output: <people_count>2</people_count>
<background>A person sitting on a couch, covered by a pink blanket. They are positioned against white striped curtains.</background>
Error during frame analysis: Could not extract json string from output: <people_count>2</people_count>