LLM Structured Output: JSON Mode & Pydantic Guide

Learn LLM structured output in Python with 3 methods: OpenAI JSON schema, Claude tool extraction, and Instructor. Build a type-safe invoice parser with Pydantic.

Written by Selva Prabhakaran | 26 min read

Parse invoices, extract contacts, and build type-safe data pipelines with JSON mode, Pydantic, and Instructor.

⚡ This post has interactive code — click ▶ Run or press Ctrl+Enter on any code block to execute it directly in your browser. The first run may take a few seconds to initialize.

You tell an LLM: “Pull the customer name and total from this invoice.” It replies with a paragraph. Nice prose. But you needed a JSON object, not a story.

You try adding “respond in JSON” to the prompt. Sometimes it works. Sometimes you get JSON wrapped in markdown. Sometimes the model ignores you.

That’s the structured output problem. LLMs speak text. Your code needs data. Today, every major provider has a fix. And a library called Instructor makes the whole thing simple.

We’ll build an invoice parser three ways. First, with OpenAI’s JSON schema mode. Then with Claude’s tool-based extraction. Finally, with Instructor for type-safe output and auto-retries. You’ll know when to reach for each one.

What Is Structured Output?

An LLM always returns a string. Structured output forces that string into a format you define — often JSON with a fixed schema.

Here’s the problem in action. You want data from this invoice:

python

Invoice #1042
Customer: Priya Sharma
Date: 2025-08-15

Items:
- Widget A x3 @ $12.00 = $36.00
- Widget B x1 @ $45.50 = $45.50

Subtotal: $81.50
Tax (8%): $6.52
Total: $88.02

Without structured output, the model says: “The invoice is for Priya Sharma, totaling $88.02…” That’s great for a human. It’s useless for code that needs invoice["total"].

With structured output, you get this instead:

json

{
  "invoice_number": "1042",
  "customer_name": "Priya Sharma",
  "date": "2025-08-15",
  "line_items": [
    {"description": "Widget A", "quantity": 3, "unit_price": 12.00, "total": 36.00},
    {"description": "Widget B", "quantity": 1, "unit_price": 45.50, "total": 45.50}
  ],
  "subtotal": 81.50,
  "tax": 6.52,
  "total": 88.02
}

Clean. Typed. Ready for your pipeline.

KEY INSIGHT: Structured output doesn’t make the LLM smarter. It constrains the format so your code can use the response. The model still does the thinking. You just control the shape.

Three approaches exist today:

Approach	How It Works	Lock-in	Validation
JSON schema mode	Provider forces valid JSON from your schema	OpenAI, Claude	Schema-level
Tool-based extraction	A “fake tool” whose params match your schema	All providers	Schema-level
Instructor library	Pydantic model = schema; auto-retries on error	None	Field-level + retries

We’ll build the same parser with each one. Same input. Same output. Different method.

Setting Up the Project

What You Need

Python: 3.9+
Libraries: pydantic (2.0+), requests, instructor (1.0+)
Install: pip install pydantic requests instructor
API keys: OpenAI (get one here), Anthropic (get one here)
Time: 25 minutes

Every code block runs in the browser with Pyodide. We mock API calls so you don’t need a key to practice.

The first block sets up imports, the sample invoice, and mock mode. Flip MOCK_MODE to False when you’re ready for real APIs.

import micropip
await micropip.install(['requests', 'pydantic'])

import os
from js import prompt
OPENAI_API_KEY = prompt("Enter your OpenAI API key:")
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
ANTHROPIC_API_KEY = prompt("Enter your Anthropic API key:")
os.environ["ANTHROPIC_API_KEY"] = ANTHROPIC_API_KEY

import json
import os
from dataclasses import dataclass

MOCK_MODE = True  # Set to False to call real APIs

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "sk-your-key")
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY", "sk-ant-your-key")

INVOICE_TEXT = """Invoice #1042
Customer: Priya Sharma
Date: 2025-08-15

Items:
- Widget A x3 @ $12.00 = $36.00
- Widget B x1 @ $45.50 = $45.50

Subtotal: $81.50
Tax (8%): $6.52
Total: $88.02"""

print("Setup complete.")
print(f"Mock mode: {MOCK_MODE}")
print(f"Invoice length: {len(INVOICE_TEXT)} characters")

Output:

python

Setup complete.
Mock mode: True
Invoice length: 197 characters

Next, the Pydantic models. These define what the invoice looks like as a Python object. They also serve as the JSON schema we send to the LLM.

LineItem holds one product line. Invoice holds the full document. The Field(description=...) text becomes part of the schema. The LLM reads it to decide what goes where.

from pydantic import BaseModel, Field
from typing import List

class LineItem(BaseModel):
    description: str = Field(description="Product or service name")
    quantity: int = Field(description="Number of units", ge=1)
    unit_price: float = Field(description="Price per unit in dollars", ge=0)
    total: float = Field(description="Line total: quantity * unit_price", ge=0)

class Invoice(BaseModel):
    invoice_number: str = Field(description="Invoice ID number")
    customer_name: str = Field(description="Full name of the customer")
    date: str = Field(description="Invoice date in YYYY-MM-DD format")
    line_items: List[LineItem] = Field(description="Purchased items")
    subtotal: float = Field(description="Sum before tax", ge=0)
    tax: float = Field(description="Tax amount in dollars", ge=0)
    total: float = Field(description="Final total with tax", ge=0)

schema = Invoice.model_json_schema()
print(json.dumps(schema, indent=2)[:500])

Output:

python

{
  "$defs": {
    "LineItem": {
      "properties": {
        "description": {
          "description": "Product or service name",
          "title": "Description",
          "type": "string"
        },
        "quantity": {
          "description": "Number of units",
          "minimum": 1,
          "title": "Quantity",
          "type": "integer"
        },
        "unit_price": {
          "description": "Price per unit in dollars",
          "minimum": 0,
          "title": "Unit Price",
          "type": "number"
        },
        "total": {
          "description": "Line total: quantity *

That schema goes to the LLM. It sees names, types, and hints. It returns JSON to match.

TIP: Always add description to Pydantic fields for LLM work. Without it, a field named date might get “August 15, 2025” instead of “2025-08-15”. The description steers the format.

Approach 1: OpenAI’s JSON Schema Mode

OpenAI calls this “Structured Outputs.” You pass a JSON schema in response_format. The API guarantees the result matches it. No broken JSON. No missing fields.

How does it work? OpenAI limits which tokens the model can pick at each step. Only tokens that keep the output valid are allowed. Required fields always show up. Types are always right. The schema is enforced during output, not after.

The function below builds the HTTP request. The key part is response_format. It holds type: "json_schema" and the schema from our Pydantic model. In mock mode, it returns a pre-built dict.

def openai_extract(text, schema, mock=True):
    """Extract data using OpenAI's response_format."""
    if mock:
        return {
            "invoice_number": "1042",
            "customer_name": "Priya Sharma",
            "date": "2025-08-15",
            "line_items": [
                {"description": "Widget A", "quantity": 3,
                 "unit_price": 12.0, "total": 36.0},
                {"description": "Widget B", "quantity": 1,
                 "unit_price": 45.5, "total": 45.5}
            ],
            "subtotal": 81.5, "tax": 6.52, "total": 88.02
        }
    import requests
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer {OPENAI_API_KEY}"
    }
    payload = {
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": "Extract invoice data."},
            {"role": "user", "content": text}
        ],
        "response_format": {
            "type": "json_schema",
            "json_schema": {
                "name": "invoice_extraction",
                "strict": True,
                "schema": schema
            }
        }
    }
    resp = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers=headers, json=payload
    )
    content = resp.json()["choices"][0]["message"]["content"]
    return json.loads(content)

Call it and validate with Pydantic. The Invoice(**data) line runs every check: types, ranges, required fields. Bad data raises a clear error.

raw_data = openai_extract(INVOICE_TEXT, schema, mock=MOCK_MODE)
invoice = Invoice(**raw_data)
print(f"Customer: {invoice.customer_name}")
print(f"Items: {len(invoice.line_items)}")
print(f"Total: ${invoice.total:.2f}")
print(f"\nFull object:\n{invoice.model_dump_json(indent=2)}")

Output:

python

Customer: Priya Sharma
Items: 2
Total: $88.02

Full object:
{
  "invoice_number": "1042",
  "customer_name": "Priya Sharma",
  "date": "2025-08-15",
  "line_items": [
    {
      "description": "Widget A",
      "quantity": 3,
      "unit_price": 12.0,
      "total": 36.0
    },
    {
      "description": "Widget B",
      "quantity": 1,
      "unit_price": 45.5,
      "total": 45.5
    }
  ],
  "subtotal": 81.5,
  "tax": 6.52,
  "total": 88.02
}

The model returns JSON. Pydantic checks it. You get a typed object.

One detail matters: set "strict": True in the schema config. Without it, the model tries to follow the schema. With it, the schema is enforced. “Tries” vs. “guaranteed” — big difference in a pipeline.

WARNING: OpenAI’s strict mode doesn’t support every JSON Schema feature. Default values, oneOf, and pattern fields aren’t allowed. If your Pydantic model uses Optional, mark those with a union type. See the OpenAI docs for what’s allowed.

Approach 2: Claude’s Tool-Based Extraction

Claude uses a different path. You define a “tool” whose input schema matches the data you want. Then you tell Claude to call it. The model “calls” the tool with your extracted data as arguments.

Why does this work? When Claude calls a tool, it has to produce valid JSON for that tool’s parameters. We exploit that. We create a tool called extract_invoice with our Invoice schema. Claude fills in the fields. We grab those fields. Done.

The function below does this. The tools array holds one tool. The tool_choice parameter forces Claude to use it. No text-only escape route.

def claude_extract(text, schema, mock=True):
    """Extract data using Claude's tool-based approach."""
    if mock:
        return {
            "invoice_number": "1042",
            "customer_name": "Priya Sharma",
            "date": "2025-08-15",
            "line_items": [
                {"description": "Widget A", "quantity": 3,
                 "unit_price": 12.0, "total": 36.0},
                {"description": "Widget B", "quantity": 1,
                 "unit_price": 45.5, "total": 45.5}
            ],
            "subtotal": 81.5, "tax": 6.52, "total": 88.02
        }
    import requests
    headers = {
        "Content-Type": "application/json",
        "x-api-key": ANTHROPIC_API_KEY,
        "anthropic-version": "2023-06-01"
    }
    payload = {
        "model": "claude-sonnet-4-20250514",
        "max_tokens": 1024,
        "tools": [{
            "name": "extract_invoice",
            "description": "Extract structured invoice data",
            "input_schema": schema
        }],
        "tool_choice": {"type": "tool", "name": "extract_invoice"},
        "messages": [
            {"role": "user",
             "content": f"Extract the invoice data:\n\n{text}"}
        ]
    }
    resp = requests.post(
        "https://api.anthropic.com/v1/messages",
        headers=headers, json=payload
    )
    data = resp.json()
    tool_block = [b for b in data["content"]
                  if b["type"] == "tool_use"][0]
    return tool_block["input"]

Same pattern: extract, validate, use.

raw_data = claude_extract(INVOICE_TEXT, schema, mock=MOCK_MODE)
invoice = Invoice(**raw_data)
print(f"Customer: {invoice.customer_name}")
print(f"Items: {len(invoice.line_items)}")
print(f"Total: ${invoice.total:.2f}")

Output:

python

Customer: Priya Sharma
Items: 2
Total: $88.02

Same result. Different engine. The tool method works with any provider that supports tool calling. That’s OpenAI, Claude, Gemini, Mistral, and local models via Ollama.

KEY INSIGHT: Tool-based extraction is the universal approach. Every major LLM supports tool calling. If you need one function that works across providers, tools are the way.

There’s a real gap though. OpenAI’s response_format enforces the schema while the model writes. Tool calling checks the output after. Both give you valid JSON. But response_format handles deeply nested schemas more reliably.

Approach 3: The Instructor Library

Instructor takes the best of both methods and wraps them in a clean API. You define a Pydantic model. You call create(). You get back a typed Python object. No HTTP code. No JSON parsing.

But the real power is retries. Say the model returns a string where you needed an int. Instructor catches the error. It sends it back to the model. The model fixes it. This loop runs up to max_retries times.

Here’s the setup. Instructor wraps your LLM client. The from_provider() function picks the right backend.

def instructor_extract(text, mock=True):
    """Extract using the Instructor library."""
    if mock:
        return Invoice(
            invoice_number="1042",
            customer_name="Priya Sharma",
            date="2025-08-15",
            line_items=[
                LineItem(description="Widget A", quantity=3,
                         unit_price=12.0, total=36.0),
                LineItem(description="Widget B", quantity=1,
                         unit_price=45.5, total=45.5)
            ],
            subtotal=81.5, tax=6.52, total=88.02
        )
    import instructor
    client = instructor.from_provider("openai/gpt-4o")
    return client.chat.completions.create(
        response_model=Invoice,
        messages=[
            {"role": "system",
             "content": "Extract invoice data from the text."},
            {"role": "user", "content": text}
        ],
        max_retries=3
    )

The max_retries=3 is where Instructor earns its keep.

invoice = instructor_extract(INVOICE_TEXT, mock=MOCK_MODE)
print(f"Customer: {invoice.customer_name}")
print(f"Items: {len(invoice.line_items)}")
print(f"Total: ${invoice.total:.2f}")
print(f"\nType: {type(invoice).__name__}")
print(f"Quantity type: {type(invoice.line_items[0].quantity).__name__}")

Output:

python

Customer: Priya Sharma
Items: 2
Total: $88.02

Type: Invoice
Quantity type: int

See that? invoice is already an Invoice object. Not a dict. You use dot notation. Every field has the right type.

My rule: if you’re pulling data from more than a few docs, use Instructor. The retry logic alone saves hours. For a quick test, raw HTTP works fine.

Handling Validation Failures and Retries

What if the model messes up? It puts “three” instead of 3. Or it skips a required field. Without retries, your pipeline crashes.

Instructor handles this for you. But knowing how the retry loop works helps you write better models:

Instructor sends the prompt with your schema
The model returns JSON
Pydantic checks the JSON against your model
Pass? Done. Return the object.
Fail? Instructor grabs the error message and sends it back to the model
The model reads the error, fixes the issue, and tries again
Repeat up to max_retries times

Let’s see validation in action. We’ll make a stricter model that checks the math. The line total must equal quantity times price.

from pydantic import field_validator

class StrictLineItem(BaseModel):
    description: str
    quantity: int = Field(ge=1)
    unit_price: float = Field(ge=0)
    total: float = Field(ge=0)

    @field_validator("total")
    @classmethod
    def check_math(cls, v, info):
        qty = info.data.get("quantity", 0)
        price = info.data.get("unit_price", 0)
        expected = round(qty * price, 2)
        if abs(v - expected) > 0.01:
            raise ValueError(
                f"total {v} != quantity({qty}) * "
                f"unit_price({price}) = {expected}"
            )
        return v

good_item = StrictLineItem(
    description="Widget A", quantity=3,
    unit_price=12.0, total=36.0
)
print(f"Valid item: {good_item.description} - ${good_item.total}")

try:
    bad_item = StrictLineItem(
        description="Widget A", quantity=3,
        unit_price=12.0, total=99.0
    )
except Exception as e:
    print(f"\nValidation error: {e}")

Output:

python

Valid item: Widget A - $36.0

Validation error: 1 validation error for StrictLineItem
total
  Value error: total 99.0 != quantity(3) * unit_price(12.0) = 36.0 [type=value_error, input_value=99.0, input_loc=('total',)]
    For further information visit https://errors.pydantic.dev/2.10/v/value_error

The check catches wrong math. With Instructor, that error goes straight back to the model. The model reads “total 99.0 != quantity(3) * unit_price(12.0) = 36.0” and fixes it.

TIP: Write clear validator error messages. The model reads them during retries. “Validation failed” gives no help. “total 99.0 != quantity(3) * unit_price(12.0) = 36.0” tells the model exactly what’s wrong.

Comparing the Three Approaches

You’ve seen all three. Which should you pick?

Feature	OpenAI Schema	Tool Extraction	Instructor
Providers	OpenAI only	All major ones	All major ones
Schema enforcement	During generation	After generation	After + retries
Built-in retries	No	No	Yes
Lines of code	~20	~20	~5
Nested schemas	Strong	Strong	Strong
Best for	OpenAI pipelines	Multi-provider	Production

Here’s my take:

Pick OpenAI schema mode when you’re on OpenAI only. It has the best guarantee for complex schemas.
Pick tool extraction when you need to swap providers. Same pattern everywhere.
Pick Instructor for real work. Retries, checks, and clean API save real time.

You can mix them too. Instructor uses response_format on OpenAI and tools for Claude. Best of both worlds, no code change.

Building the Full Invoice Parser

Let’s wire it all into one pipeline. The parser takes raw text, picks a method, checks the result, and prints a summary.

The parse_invoice() function routes to the right extractor based on method. The Pydantic check is the same every time.

def parse_invoice(text, method="instructor", mock=True):
    """Parse invoice text into a validated Invoice object."""
    extractors = {
        "openai": lambda t: openai_extract(t, schema, mock),
        "claude": lambda t: claude_extract(t, schema, mock),
        "instructor": lambda t: instructor_extract(t, mock),
    }
    if method not in extractors:
        raise ValueError(f"Unknown method: {method}")

    print(f"Extracting with: {method}")
    result = extractors[method](text)

    if isinstance(result, Invoice):
        invoice = result
    else:
        invoice = Invoice(**result)

    print(f"Customer: {invoice.customer_name}")
    print(f"Invoice #: {invoice.invoice_number}")
    print(f"Date: {invoice.date}")
    print(f"Items:")
    for item in invoice.line_items:
        print(f"  - {item.description}: "
              f"{item.quantity} x ${item.unit_price:.2f} "
              f"= ${item.total:.2f}")
    print(f"Subtotal: ${invoice.subtotal:.2f}")
    print(f"Tax: ${invoice.tax:.2f}")
    print(f"Total: ${invoice.total:.2f}")
    return invoice

Run all three methods on the same invoice.

for method in ["openai", "claude", "instructor"]:
    print(f"\n{'='*40}")
    invoice = parse_invoice(INVOICE_TEXT, method=method, mock=MOCK_MODE)
    print(f"{'='*40}")

Output:

python

========================================
Extracting with: openai
Customer: Priya Sharma
Invoice #: 1042
Date: 2025-08-15
Items:
  - Widget A: 3 x $12.00 = $36.00
  - Widget B: 1 x $45.50 = $45.50
Subtotal: $81.50
Tax: $6.52
Total: $88.02
========================================

========================================
Extracting with: claude
Customer: Priya Sharma
Invoice #: 1042
Date: 2025-08-15
Items:
  - Widget A: 3 x $12.00 = $36.00
  - Widget B: 1 x $45.50 = $45.50
Subtotal: $81.50
Tax: $6.52
Total: $88.02
========================================

========================================
Extracting with: instructor
Customer: Priya Sharma
Invoice #: 1042
Date: 2025-08-15
Items:
  - Widget A: 3 x $12.00 = $36.00
  - Widget B: 1 x $45.50 = $45.50
Subtotal: $81.50
Tax: $6.52
Total: $88.02
========================================

Same result from all three. The method changes. The checks stay the same. That’s the beauty of Pydantic as the shared layer.

Common Mistakes and How to Fix Them

Mistake 1: Skipping `strict: True` in OpenAI

Without strict: True, the schema is a suggestion. The model tries to follow it. But “tries” can fail on deeply nested objects.

bad_format = {
    "type": "json_schema",
    "json_schema": {
        "name": "invoice",
        "schema": schema
    }
}
print("Missing 'strict': schema is a suggestion only")

good_format = {
    "type": "json_schema",
    "json_schema": {
        "name": "invoice",
        "strict": True,
        "schema": schema
    }
}
print("With 'strict': True — schema is enforced")

Output:

python

Missing 'strict': schema is a suggestion only
With 'strict': True — schema is enforced

Mistake 2: Trusting the JSON Without Validation

The model returned JSON. But is the data correct? A model might say {"quantity": -5}. Valid JSON. Bad data. Always validate with Pydantic.

raw = {"invoice_number": "1042", "customer_name": "Test",
       "date": "2025-08-15", "line_items": [],
       "subtotal": 81.5, "tax": 6.52, "total": -10.0}

try:
    bad_invoice = Invoice(**raw)
    print("This should not print")
except Exception as e:
    error_str = str(e)
    print(f"Caught bad data: {error_str[:120]}")

Output:

python

Caught bad data: 1 validation error for Invoice
total
  Input should be greater than or equal to 0 [type=greater_than_equal, input_value=-10.0

Mistake 3: Using JSON Mode Instead of Structured Outputs

OpenAI’s older "type": "json_object" gives you valid JSON. That’s it. It could return {"note": "here's the data"} and JSON mode would be happy. It doesn’t check your schema.

old_way = {"type": "json_object"}
print(f"JSON mode: {old_way}")
print("Gives you: valid JSON")
print("Does NOT give you: schema match")

new_way = {
    "type": "json_schema",
    "json_schema": {"name": "x", "strict": True, "schema": schema}
}
print(f"\nStructured output type: {new_way['type']}")
print("Gives you: valid JSON + schema match")

Output:

python

JSON mode: {'type': 'json_object'}
Gives you: valid JSON
Does NOT give you: schema match

Structured output type: json_schema
Gives you: valid JSON + schema match

Always pick json_schema over json_object. The newer option does everything the old one does, plus schema enforcement.

When NOT to Use Structured Output

Structured output isn’t always the right call. Skip it in these cases.

Creative text. If you want a poem or a story, JSON hurts quality. Schema limits what the model can express.

Simple yes/no answers. A one-token response with max_tokens=1 is faster and cheaper than a full schema. Don’t bring heavy tools to a light job.

Streaming chatbots. Some modes buffer the full reply before sending it. If you need token-by-token output, check your provider first. OpenAI and Claude both support it. Not all do.

NOTE: Instructor supports streaming via create_partial. You get partial objects as tokens arrive. Handy for showing progress in a UI while still getting full checks at the end.

Exercises

Exercise 1: Extract Contact Information

You’ve parsed invoices. Now try a different task. Define a Pydantic model for a contact card and write the extraction logic.

# Exercise 1: Define a Contact model and extract data

class Contact(BaseModel):
    name: str = Field(description="Full name")
    email: str = Field(description="Email address")
    phone: str = Field(description="Phone number")
    company: str = Field(description="Company name")

sample_text = """
Hi, I'm Raj Patel from DataFlow Inc.
Reach me at raj@dataflow.io or call 555-0142.
"""

# TODO: Write extract_contact(text) that returns a Contact.
# Use the mock approach from the tutorial.
# Print the contact's name and email.

# Your code here:

Hint 1

Return a `Contact` object with hardcoded values from the text. Same pattern as the mock functions in the tutorial.

Hint 2

def extract_contact(text):
    return Contact(
        name="Raj Patel",
        email="raj@dataflow.io",
        phone="555-0142",
        company="DataFlow Inc."
    )

Solution

def extract_contact(text):
    """Extract contact info — mock version."""
    return Contact(
        name="Raj Patel",
        email="raj@dataflow.io",
        phone="555-0142",
        company="DataFlow Inc."
    )

contact = extract_contact(sample_text)
print(f"Name: {contact.name}")
print(f"Email: {contact.email}")
print(f"Phone: {contact.phone}")
print(f"Company: {contact.company}")

**Output:**

python

Name: Raj Patel
Email: raj@dataflow.io
Phone: 555-0142
Company: DataFlow Inc.

The Pydantic model defines the shape. The function fills it. Checks run when you create the object.

Exercise 2: Add a Custom Validator

The invoice parser trusts the model’s math. But what if the subtotal is wrong? Add a validator that checks whether the subtotal equals the sum of line item totals.

# Exercise 2: Add a subtotal validator

class ValidatedInvoice(BaseModel):
    invoice_number: str
    customer_name: str
    date: str
    line_items: List[LineItem]
    subtotal: float = Field(ge=0)
    tax: float = Field(ge=0)
    total: float = Field(ge=0)

    # TODO: Add a @field_validator for "subtotal"
    # Check: sum of line_items[i].total == subtotal
    # Allow $0.01 tolerance. Raise ValueError if off.

test_data = {
    "invoice_number": "1042",
    "customer_name": "Test",
    "date": "2025-08-15",
    "line_items": [
        {"description": "A", "quantity": 2,
         "unit_price": 10.0, "total": 20.0},
        {"description": "B", "quantity": 1,
         "unit_price": 5.0, "total": 5.0}
    ],
    "subtotal": 25.0,
    "tax": 2.0,
    "total": 27.0
}

Hint 1

Use `@field_validator(“subtotal”)` with `@classmethod`. Get line items from `info.data[“line_items”]`. Compare `sum(item.total for item in items)` to the value.

Hint 2

@field_validator("subtotal")
@classmethod
def check_subtotal(cls, v, info):
    items = info.data.get("line_items", [])
    expected = sum(item.total for item in items)
    # Compare with 0.01 tolerance...

Solution

class ValidatedInvoice(BaseModel):
    invoice_number: str
    customer_name: str
    date: str
    line_items: List[LineItem]
    subtotal: float = Field(ge=0)
    tax: float = Field(ge=0)
    total: float = Field(ge=0)

    @field_validator("subtotal")
    @classmethod
    def check_subtotal(cls, v, info):
        items = info.data.get("line_items", [])
        expected = round(sum(item.total for item in items), 2)
        if abs(v - expected) > 0.01:
            raise ValueError(
                f"subtotal {v} != sum of line items {expected}"
            )
        return v

good = ValidatedInvoice(**test_data)
print(f"Valid invoice: subtotal=${good.subtotal}")

bad_data = test_data.copy()
bad_data["subtotal"] = 999.0
try:
    bad = ValidatedInvoice(**bad_data)
except Exception as e:
    print(f"Caught: subtotal error detected")

**Output:**

python

Valid invoice: subtotal=$25.0
Caught: subtotal error detected

With Instructor, this error goes back to the model. It reads the message and fixes the math next try.

Summary

You’ve built an invoice parser three ways. Here’s what each one gives you.

OpenAI’s JSON schema mode enforces your schema during token generation. Strongest guarantee. But only works with OpenAI.

Tool-based extraction works everywhere. Define a tool with your schema. The model fills in the params. Universal.

Instructor wraps both in a clean API. Pydantic model in, validated object out. Auto-retries on bad data. Best for production.

The common thread is Pydantic. It defines your schema. It checks the output. It gives you typed objects. Learn Pydantic well and you can swap methods or providers without touching your data layer.

Practice Exercise

Build a batch extractor. Take a list of invoice texts and extract each one. If one fails, log the error and keep going. Count successes and failures.

Solution

def batch_extract(texts, mock=True):
    """Extract multiple invoices with error handling."""
    successes = []
    failures = []

    for i, text in enumerate(texts):
        try:
            result = instructor_extract(text, mock=mock)
            if not isinstance(result, Invoice):
                result = Invoice(**result)
            successes.append(result)
            print(f"Invoice {i+1}: OK — {result.customer_name}")
        except Exception as e:
            failures.append({"index": i, "error": str(e)})
            print(f"Invoice {i+1}: FAILED — {str(e)[:60]}")

    print(f"\nResults: {len(successes)} passed, "
          f"{len(failures)} failed")
    return successes, failures

invoices = [INVOICE_TEXT, INVOICE_TEXT]
results, errors = batch_extract(invoices, mock=MOCK_MODE)

**Output:**

python

Invoice 1: OK — Priya Sharma
Invoice 2: OK — Priya Sharma

Results: 2 passed, 0 failed

Error Troubleshooting

Common errors and fixes for structured output pipelines.

ValidationError: 1 validation error for Invoice

The model’s output didn’t match your Pydantic model. Check which field failed. Usual causes: string instead of number, or a missing field. Fix: add description to fields, or use Instructor with max_retries=3.

json.JSONDecodeError: Expecting value

The API didn’t return JSON. This happens with JSON mode when the model wraps its output in markdown. Fix: use json_schema mode, not json_object.

KeyError: 'choices' or KeyError: 'content'

The API returned an error, not a completion. Check resp.json() for an "error" key. Causes: bad API key, rate limit, or bad request. Fix: check the error message first.

instructor.exceptions.InstructorRetryException

Instructor used all retries and still failed. Fix: make the model simpler, raise max_retries, or add more context to the prompt.

Complete Code

Click to expand the full script (copy-paste and run)

# Complete code from: Structured Output from LLMs in Python
# Requires: pip install pydantic requests instructor
# Python 3.9+

import json
import os
from pydantic import BaseModel, Field, field_validator
from typing import List

# --- Config ---
MOCK_MODE = True
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY", "sk-your-key")
ANTHROPIC_API_KEY = os.environ.get("ANTHROPIC_API_KEY", "sk-ant-your-key")

INVOICE_TEXT = """Invoice #1042
Customer: Priya Sharma
Date: 2025-08-15

Items:
- Widget A x3 @ $12.00 = $36.00
- Widget B x1 @ $45.50 = $45.50

Subtotal: $81.50
Tax (8%): $6.52
Total: $88.02"""

# --- Pydantic Models ---
class LineItem(BaseModel):
    description: str = Field(description="Product or service name")
    quantity: int = Field(description="Number of units", ge=1)
    unit_price: float = Field(description="Price per unit", ge=0)
    total: float = Field(description="Line total", ge=0)

class Invoice(BaseModel):
    invoice_number: str = Field(description="Invoice ID")
    customer_name: str = Field(description="Customer name")
    date: str = Field(description="Date in YYYY-MM-DD")
    line_items: List[LineItem] = Field(description="Items")
    subtotal: float = Field(description="Sum before tax", ge=0)
    tax: float = Field(description="Tax amount", ge=0)
    total: float = Field(description="Final total", ge=0)

schema = Invoice.model_json_schema()

# --- Extractors ---
def openai_extract(text, schema, mock=True):
    if mock:
        return {
            "invoice_number": "1042",
            "customer_name": "Priya Sharma",
            "date": "2025-08-15",
            "line_items": [
                {"description": "Widget A", "quantity": 3,
                 "unit_price": 12.0, "total": 36.0},
                {"description": "Widget B", "quantity": 1,
                 "unit_price": 45.5, "total": 45.5}
            ],
            "subtotal": 81.5, "tax": 6.52, "total": 88.02
        }
    import requests
    headers = {"Content-Type": "application/json",
               "Authorization": f"Bearer {OPENAI_API_KEY}"}
    payload = {
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": "Extract invoice data."},
            {"role": "user", "content": text}],
        "response_format": {
            "type": "json_schema",
            "json_schema": {"name": "invoice",
                           "strict": True, "schema": schema}}
    }
    resp = requests.post(
        "https://api.openai.com/v1/chat/completions",
        headers=headers, json=payload)
    return json.loads(
        resp.json()["choices"][0]["message"]["content"])

def claude_extract(text, schema, mock=True):
    if mock:
        return {
            "invoice_number": "1042",
            "customer_name": "Priya Sharma",
            "date": "2025-08-15",
            "line_items": [
                {"description": "Widget A", "quantity": 3,
                 "unit_price": 12.0, "total": 36.0},
                {"description": "Widget B", "quantity": 1,
                 "unit_price": 45.5, "total": 45.5}
            ],
            "subtotal": 81.5, "tax": 6.52, "total": 88.02
        }
    import requests
    headers = {"Content-Type": "application/json",
               "x-api-key": ANTHROPIC_API_KEY,
               "anthropic-version": "2023-06-01"}
    payload = {
        "model": "claude-sonnet-4-20250514",
        "max_tokens": 1024,
        "tools": [{"name": "extract_invoice",
                    "description": "Extract invoice data",
                    "input_schema": schema}],
        "tool_choice": {"type": "tool",
                       "name": "extract_invoice"},
        "messages": [{"role": "user",
                      "content": f"Extract:\n\n{text}"}]
    }
    resp = requests.post(
        "https://api.anthropic.com/v1/messages",
        headers=headers, json=payload)
    return [b for b in resp.json()["content"]
            if b["type"] == "tool_use"][0]["input"]

def instructor_extract(text, mock=True):
    if mock:
        return Invoice(
            invoice_number="1042",
            customer_name="Priya Sharma",
            date="2025-08-15",
            line_items=[
                LineItem(description="Widget A", quantity=3,
                         unit_price=12.0, total=36.0),
                LineItem(description="Widget B", quantity=1,
                         unit_price=45.5, total=45.5)],
            subtotal=81.5, tax=6.52, total=88.02)
    import instructor
    client = instructor.from_provider("openai/gpt-4o")
    return client.chat.completions.create(
        response_model=Invoice,
        messages=[
            {"role": "system",
             "content": "Extract invoice data."},
            {"role": "user", "content": text}],
        max_retries=3)

# --- Pipeline ---
def parse_invoice(text, method="instructor", mock=True):
    extractors = {
        "openai": lambda t: openai_extract(t, schema, mock),
        "claude": lambda t: claude_extract(t, schema, mock),
        "instructor": lambda t: instructor_extract(t, mock),
    }
    result = extractors[method](text)
    inv = result if isinstance(result, Invoice) else Invoice(**result)
    print(f"[{method}] {inv.customer_name} — ${inv.total:.2f}")
    return inv

# --- Run ---
for m in ["openai", "claude", "instructor"]:
    parse_invoice(INVOICE_TEXT, method=m, mock=MOCK_MODE)

print("\nScript completed successfully.")

Frequently Asked Questions

Can I use structured output with open-source models?

Yes. Run Llama or Mistral through Ollama or vLLM. Both support JSON schema limits. Instructor works with Ollama too — use instructor.from_provider("ollama/llama3"). Your Pydantic model stays the same.

How does structured output affect cost?

The schema doesn’t count as input tokens. But JSON output is longer than plain text — keys, braces, and quotes add up. Expect 20-50% more output tokens for the same data. Nested schemas cost more.

What’s the difference between JSON mode and structured outputs?

JSON mode (type: "json_object") gives you valid JSON. Any JSON. The model could return {"note": "hello"} and that’s fine. Structured outputs (type: "json_schema") give you JSON that matches your schema. Required fields are there. Types are right. Big gap.

Does Instructor work with Claude and Gemini?

Yes. It supports 15+ providers. Use instructor.from_provider("anthropic/claude-sonnet-4-20250514") for Claude or instructor.from_provider("google/gemini-2.0-flash") for Gemini. Your code stays the same.

What if the model can’t fill a required field?

With OpenAI schemas, the model always fills required fields. But it may guess wrong. With Instructor, if a check catches a bad guess, the retry loop tries again. For data that might not be in the source, use Optional fields.

References

OpenAI documentation — Structured Outputs. Link
Anthropic documentation — Structured Outputs with Claude. Link
Instructor library — Official docs. Link
Pydantic documentation — JSON schema generation. Link
OpenAI blog — Introducing Structured Outputs. Link
Anthropic cookbook — Extracting structured JSON with tool use. Link
Pydantic — How to Use Pydantic for LLMs. Link

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

LLM Structured Output: JSON Mode & Pydantic Guide

What Is Structured Output?

Setting Up the Project

What You Need

Approach 1: OpenAI’s JSON Schema Mode

Approach 2: Claude’s Tool-Based Extraction

Approach 3: The Instructor Library

Handling Validation Failures and Retries

Comparing the Three Approaches

Building the Full Invoice Parser

Common Mistakes and How to Fix Them

Mistake 1: Skipping `strict: True` in OpenAI

Mistake 2: Trusting the JSON Without Validation

Mistake 3: Using JSON Mode Instead of Structured Outputs

When NOT to Use Structured Output

Exercises

Exercise 1: Extract Contact Information

Exercise 2: Add a Custom Validator

Summary

Practice Exercise

Error Troubleshooting

Complete Code

Frequently Asked Questions

Can I use structured output with open-source models?

How does structured output affect cost?

What’s the difference between JSON mode and structured outputs?

Does Instructor work with Claude and Gemini?

What if the model can’t fill a required field?

References

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

What Is Structured Output?

Setting Up the Project

What You Need

Approach 1: OpenAI’s JSON Schema Mode

Approach 2: Claude’s Tool-Based Extraction

Approach 3: The Instructor Library

Handling Validation Failures and Retries

Comparing the Three Approaches

Building the Full Invoice Parser

Common Mistakes and How to Fix Them

Mistake 1: Skipping strict: True in OpenAI

Mistake 2: Trusting the JSON Without Validation

Mistake 3: Using JSON Mode Instead of Structured Outputs

When NOT to Use Structured Output

Exercises

Exercise 1: Extract Contact Information

Exercise 2: Add a Custom Validator

Summary

Practice Exercise

Error Troubleshooting

Complete Code

Frequently Asked Questions

Can I use structured output with open-source models?

How does structured output affect cost?

What’s the difference between JSON mode and structured outputs?

Does Instructor work with Claude and Gemini?

What if the model can’t fill a required field?

References

Related Articles

LLM Temperature, Top-P, and Top-K Explained — With Python Simulations

OpenAI API Python Tutorial — Chat Completions, Streaming & Error Handling

Zero-Shot vs Few-Shot Prompting: Complete Guide

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Mistake 1: Skipping `strict: True` in OpenAI