Pydantic AI

Pydantic AI is an extension of the popular Python data modeling library Pydantic for LLMs, which is widely used to define Python classes for structured data with type hints, validation, etc.

When using Pydantic AI:

You define a BaseModel class with fields for the data you expect.
You ask the LLM to produce output via Pydantic AI's functions. The underlying prompt engineering in Pydantic AI will instruct the model to output JSON that adheres to our BaseModel.
Pydantic AI then takes the model’s output (string) and tries to parse it into the Pydantic model. If it succeeds, you get an instance of your BaseModel class with type-checked data. If it fails (invalid JSON or missing/incorrect fields), Pydantic AI can optionally retry or throw an error.

Pydantic model

        ┌────────────────────────────┐
        │        Invoice             │
        │────────────────────────────│
        │ invoice_id: str            │
        │ date: str                  │
        │ subtotal: float            │
        │ tax: float                 │
        │ total_amount: float        │
        └──────┬─────────────────────┘
               │
               │
      ┌────────▼──────────┐
      │      Vendor       │
      │───────────────────│
      │ name: str         │
      │ address: str?     │
      │ gst_number: str?  │
      └───────────────────┘

               │
               │ (has many)
               ▼
      ┌──────────────────────────┐
      │       LineItem           │
      │──────────────────────────│
      │ description: str         │
      │ quantity: float          │
      │ unit_price: float        │
      │ total: float             │
      └──────────────────────────┘

Pydantic AI code

from pydantic_ai import Agent
from my_models import Invoice

agent = Agent("gpt-4o-mini", response_model=Invoice)

prompt = """
Extract invoice details from the text below.
---
Invoice #2331
Vendor: ABC Supplies
Items:
  - Laptop x2 @ $1200
  - Mouse x5 @ $20
Tax: $50
Total: $2490
"""

result = agent.run(prompt)
print(result.output.model_dump_json(indent=2))

Pydantic objects are very intuitive to write, read, debug, maintain.

If you've already used Pydantic, this will feel natural. You’re essentially treating the LLM call like a function that returns an instance of your model. Pydantic will automatically cast types when possible (e.g., if the LLM outputs “25” as a string for an integer field, Pydantic will convert it to int). It also gives nice error messages if validation fails, which you could feed back to the model.

Pydantic AI only works well for JSON schemas, and those too without deep nesting. It is best used with large, capable models that mostly produce syntactically valid outputs on their own. For instance, you might use OpenAI’s function calling or few-shot examples to get the model to produce good JSON, and then Pydantic AI to validate and parse it easily.