Find answers from the community

Updated 2 months ago

Best practices for querying structured output with Pydantic and GPT-4o

Hey guys, just wondering if there are any best practices querying as a structured output with Pydantic. We are running a structured output call where the class has one property that is a union (can be one of two pydantic classes). This is causing a lot of validation errors because it seems the LLM doesn't understand this.

Would love to hear if anyone has ideas or has also experienced this. For context, we are using GPT-4o.
W
N
L
11 comments
If you take a look here: https://docs.llamaindex.ai/en/stable/examples/llm/openai/#structured-prediction

I feel like OpenAI should be able to work on scenario like this.
Also OpenAI has this strict mode for structured output which you can use. This would increase the response timing though

https://docs.llamaindex.ai/en/stable/examples/llm/openai/#function-calling
@WhiteFang_Jr Should these APIs also work for AzureOpenAI LLM's?
Also we already have it setup like this, still causing a lot of validation errors

Plain Text
def query_index[T: BaseModel](
    output_cls: type[T] | None, prompt: str, index: VectorStoreIndex, llm: OpenAI | AzureOpenAI,
) -> T | str:
    """Query the index for a given prompt and return the output."""
    if output_cls:
        llm = llm.as_structured_llm(output_cls)
    return index.as_query_engine(prompt=prompt, llm=llm).query(prompt).response
Also quite confused by the function calling API, is this auto enabled when strict=True and the llm is a structured LLM?
strict mode won't work with nested classes I think (or at least it was broken for me last time I tried it)

Setting it up like that is just another way of doing the same thing, its using llm.structured_predict, which is quite literally converting the pydantic class to an openai tool, and sending that over the api

There are a few ways you can improve this
  • write some custom validators for your pydantic class to handle common errors the llm might make
  • simplify the structure of your pydantic models
  • catch the error, and retry with some error info
Thanks a lot. Is there any way to log what actually gets sent to OpenAI? That way it would be easier to understand what is going wrong and how the output types are being sent over
you can also peek under the hood to see the tool schema being generated

Plain Text
from llama_index.core.program.function_calling import get_function_tool

tool = get_function_tool(output_cls)
print(tool.metadata.to_openai_tool())
which will print the json representation of the tool that openai sees
Aaaaahhh nice
Thanks Logan
Add a reply
Sign up and join the conversation on Discord