Find answers from the community

Updated 11 months ago

When using `PydanticOutputParser` with

When using PydanticOutputParser with MultiModalLLMCompletionProgram how can I describe to the LLM what goes in each field of the output_cls? In some cases the fields are called something different in the image. For example, sometime a surgeon is referred to as "surgeon" and other times as a "provider" based on the organization.
L
A
3 comments
You can use pydantic field descriptions

Plain Text
from llama_index.bridge.pydantic import Field, BaseModel

class Test(BaseModel):
    """A test class."""
    name: str = Field(description="The name of a person.")
Thanks @Logan M I’m getting an error saying some required fields are missing. Is there some way to handle the case where a field doesn’t map? The weird thing is if I use Gemini vision multimodal I don’t get the error but if I use gpt4v I get the error about missing fields from the pydantic parser.
@Logan M I got this working and wanted to post a update on the cause/solution. The issue was that GPT4V was returning more complex/nested JSON than Gemini Vision which caused errors with PydanticOutputParser. I tweaked the prompt to the following and it resolved the issue.

Summarize what is in the image and return the answer in a valid flat JSON structure with single-level keys.
Add a reply
Sign up and join the conversation on Discord