Hello I am trying to use the html tag

At a glance

Hello I am trying to use the html tag style output formatting in the MultiModalLLMCompletionProgram since it is much more reliable than JSON output format, I can modify the output format prompt but even though I get expected results, it gives out an error saying that could not extract json string from output

Plain Text

format_string = '''
    Format your response using the following HTML-like tags:
    <people_count>[number]</people_count>
    <background>[description]</background>

    Do not include any other text or preamble.
    '''
    output_parser = PydanticOutputParser(
        output_cls=FrameAnalysisOutput, pydantic_format_tmpl=format_string)

    llm_program = MultiModalLLMCompletionProgram.from_defaults(
        output_parser=output_parser,
        prompt_template_str=prompt_template_str,
        multi_modal_llm=mm_model,
        image_documents=image_docs,
        verbose=True,
    )

Plain Text

Starting MultiModalLLMCompletionProgram...
> Raw output: <people_count>2</people_count>
<background>A person sitting on a couch, covered by a pink blanket. They are positioned against white striped curtains.</background>
Error during frame analysis: Could not extract json string from output: <people_count>2</people_count>

9 comments

LLogan M

right, because the output parser is looking for json, not html 👀

I think you'd have to write a custom output parser here

ssphex

hmm is there any way to use MultiModalLLMCompletionProgram without an output class or a parser then

ssphex

if not I will try

LLogan M

Yea I don't think there is, you'd have to write the output parser to write html in this case

LLogan M

You could also skip the program completely, and prompt the LLM directly

Plain Text

response = str(llm.complete("..."))
pydantic_obj = parse_response(response)

ssphex

I want to use the parallel function calling I added in the pr here but I believe it only works on that program am I wrong? https://github.com/run-llama/llama_index/pull/16091

ssphex

basically I am trying to call a multi modal ollama model to query an image and I want to do this in parallel so I can process multiple images at once

ssphex

nevermind I can just use asyncio and acomplete methods right 😅

LLogan M

Yes!

Add a reply

Find answers from the community

Hello I am trying to use the html tag