Find answers from the community

Updated 3 months ago

Hello I am trying to use the html tag

Hello I am trying to use the html tag style output formatting in the MultiModalLLMCompletionProgram since it is much more reliable than JSON output format, I can modify the output format prompt but even though I get expected results, it gives out an error saying that could not extract json string from output

Plain Text
format_string = '''
    Format your response using the following HTML-like tags:
    <people_count>[number]</people_count>
    <background>[description]</background>

    Do not include any other text or preamble.
    '''
    output_parser = PydanticOutputParser(
        output_cls=FrameAnalysisOutput, pydantic_format_tmpl=format_string)

    llm_program = MultiModalLLMCompletionProgram.from_defaults(
        output_parser=output_parser,
        prompt_template_str=prompt_template_str,
        multi_modal_llm=mm_model,
        image_documents=image_docs,
        verbose=True,
    )


Plain Text
Starting MultiModalLLMCompletionProgram...
> Raw output: <people_count>2</people_count>
<background>A person sitting on a couch, covered by a pink blanket. They are positioned against white striped curtains.</background>
Error during frame analysis: Could not extract json string from output: <people_count>2</people_count>
L
s
9 comments
right, because the output parser is looking for json, not html πŸ‘€

I think you'd have to write a custom output parser here
hmm is there any way to use MultiModalLLMCompletionProgram without an output class or a parser then
if not I will try
Yea I don't think there is, you'd have to write the output parser to write html in this case
You could also skip the program completely, and prompt the LLM directly

Plain Text
response = str(llm.complete("..."))
pydantic_obj = parse_response(response)
I want to use the parallel function calling I added in the pr here but I believe it only works on that program am I wrong? https://github.com/run-llama/llama_index/pull/16091
basically I am trying to call a multi modal ollama model to query an image and I want to do this in parallel so I can process multiple images at once
nevermind I can just use asyncio and acomplete methods right πŸ˜…
Add a reply
Sign up and join the conversation on Discord