Menu

Structured Output and Self-Correcting Agents in LangGraph

Written by Selva Prabhakaran | 35 min read


Your agent generates a beautifully worded response. But your downstream code expects a JSON object with specific fields — not prose. You parse the response, pray the keys exist, and wrap everything in a try/except. One missing field, one wrong type, and your pipeline breaks.

This is the structured output problem. LLMs produce text. Your application needs data. Bridging that gap reliably is what separates demo agents from production agents.

Here’s how the pieces fit together. You start by defining a Pydantic model — a Python class that describes the exact shape of the data you want. You attach that model to the LLM using .with_structured_output(), which tells the model to return data matching your schema instead of free-form text.

But LLMs aren’t perfect. Sometimes the response fails validation — a field has the wrong type, a value breaks a business rule, or a required field is missing. That’s where LangGraph comes in. You build a graph where an extraction node tries to parse the LLM’s response, a routing function checks whether it succeeded, and a conditional edge loops back for another attempt if it didn’t. The LLM gets the validation error in its next prompt, so it knows exactly what to fix.

After a few sections on schema basics, we’ll build that self-correcting loop from scratch.

What Is Structured Output?

Structured output means the LLM returns data in a predefined format instead of free-form text. You define a schema — the exact fields, their types, and constraints. The model’s response must conform to that schema.

Why does this matter? Because every downstream system in your pipeline expects specific data shapes. A database insert needs exact column types. An API call needs specific parameters. A UI component needs known fields to render.

Without structured output, you’re parsing natural language with regex and string splitting. That’s fragile. With structured output, the LLM gives you a Pydantic model you can use directly.

python
import os
from dotenv import load_dotenv
from pydantic import BaseModel, Field, field_validator
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langgraph.graph import StateGraph, MessagesState, START, END
from typing import TypedDict, Annotated, Optional, Literal
import operator

load_dotenv()

llm = ChatOpenAI(model="gpt-4o-mini")
print("Environment ready")
python
Environment ready

Prerequisites

  • Python version: 3.10+
  • Required libraries: langchain-openai (0.2+), langgraph (0.2+), pydantic (2.0+), python-dotenv
  • Install: pip install langchain-openai langgraph pydantic python-dotenv
  • API Key: Set OPENAI_API_KEY in a .env file
  • Time to complete: 25-30 minutes

Pydantic Models as Output Schemas

The foundation of structured output is Pydantic. You define a Python class that describes the exact shape of your LLM’s response. Each field gets a name, a type, and a description that tells the model what to put there.

Here’s a schema for extracting company information from text. Notice how each Field includes a description — this gets injected into the model’s prompt and directly affects output quality.

python
class CompanyInfo(BaseModel):
    """Extract structured company information from text."""
    name: str = Field(description="Official company name")
    industry: str = Field(description="Primary industry sector")
    employee_count: Optional[int] = Field(
        default=None,
        description="Number of employees, if mentioned"
    )
    headquarters: str = Field(
        description="City and country of headquarters"
    )
    key_products: list[str] = Field(
        description="Main products or services, max 5"
    )

print(CompanyInfo.model_json_schema())
python
{'description': 'Extract structured company information from text.', 'properties': {'name': {'description': 'Official company name', 'title': 'Name', 'type': 'string'}, 'industry': {'description': 'Primary industry sector', 'title': 'Industry', 'type': 'string'}, 'employee_count': {'anyOf': [{'type': 'integer'}, {'type': 'null'}], 'default': None, 'description': 'Number of employees, if mentioned', 'title': 'Employee Count'}, 'headquarters': {'description': 'City and country of headquarters', 'title': 'Headquarters', 'type': 'string'}, 'key_products': {'description': 'Main products or services, max 5', 'items': {'type': 'string'}, 'title': 'Key Products', 'type': 'array'}}, 'required': ['name', 'industry', 'headquarters', 'key_products'], 'title': 'CompanyInfo', 'type': 'object'}

That JSON schema is exactly what gets sent to the LLM’s API. The model reads those field descriptions and generates output to match. Vague descriptions like description="company info" produce wildly inconsistent results. Be specific.

KEY INSIGHT: Pydantic field descriptions are prompt engineering in disguise. The LLM reads them to decide what to extract. Specific descriptions produce accurate, consistent results. Vague ones produce garbage.

Using with_structured_output()

LangChain’s .with_structured_output() is the simplest way to get typed responses from an LLM. Pass your Pydantic model, and the method handles format instructions, parsing, and type coercion automatically.

Under the hood, it uses the model’s native function-calling API. Your Pydantic schema becomes a function definition. The model returns a function call with arguments matching your schema. LangChain parses those arguments into a Pydantic object.

python
structured_llm = llm.with_structured_output(CompanyInfo)

result = structured_llm.invoke(
    "Tesla was founded by Elon Musk and is headquartered in "
    "Austin, Texas, USA. The electric vehicle company has about "
    "140,000 employees and makes the Model 3, Model Y, Model S, "
    "Model X, and Cybertruck."
)

print(f"Name: {result.name}")
print(f"Industry: {result.industry}")
print(f"Employees: {result.employee_count}")
print(f"HQ: {result.headquarters}")
print(f"Products: {result.key_products}")

The result looks like this:

python
Name: Tesla
Industry: Electric Vehicles
Employees: 140000
HQ: Austin, Texas, USA
Products: ['Model 3', 'Model Y', 'Model S', 'Model X', 'Cybertruck']

The return value isn’t a string — it’s a CompanyInfo object. You can access .name, .industry, and every other field directly. No parsing, no regex, no json.loads().

TIP: Pass include_raw=True to .with_structured_output() when debugging. It returns both the parsed Pydantic object and the raw LLM response, so you can see exactly what the model generated before parsing.

When Validation Goes Beyond Type Checking

.with_structured_output() handles type coercion well. But what if the LLM returns a quarter value of “Q7” or a negative revenue figure? Those pass type checking but violate your business rules.

Pydantic’s @field_validator catches these problems. Here’s a financial report schema with custom validators that enforce real-world constraints.

python
class FinancialReport(BaseModel):
    """Extract quarterly financial data from earnings text."""
    company: str = Field(description="Company name")
    quarter: str = Field(
        description="Quarter, e.g. Q1, Q2, Q3, Q4"
    )
    year: int = Field(description="Fiscal year")
    revenue_millions: float = Field(
        description="Revenue in millions of dollars"
    )
    profit_millions: float = Field(
        description="Net profit in millions of dollars"
    )

    @field_validator("quarter")
    @classmethod
    def validate_quarter(cls, v):
        valid = {"Q1", "Q2", "Q3", "Q4"}
        if v not in valid:
            raise ValueError(
                f"Quarter must be one of {valid}, got '{v}'"
            )
        return v

    @field_validator("revenue_millions")
    @classmethod
    def validate_revenue(cls, v):
        if v < 0:
            raise ValueError(
                f"Revenue cannot be negative, got {v}"
            )
        return v

print("FinancialReport schema with validators defined")
python
FinancialReport schema with validators defined

What happens when bad data hits these validators? Let’s test with an invalid quarter.

python
try:
    bad_report = FinancialReport(
        company="Test Corp",
        quarter="Q7",
        year=2025,
        revenue_millions=100.0,
        profit_millions=10.0,
    )
except Exception as e:
    print(f"Validation error: {e}")

The validator rejects it immediately:

python
Validation error: 1 validation error for FinancialReport
quarter
  Value error, Quarter must be one of {'Q1', 'Q2', 'Q3', 'Q4'}, got 'Q7' [type=value_error, input_value='Q7', url=https://errors.pydantic.dev/2.11/v/value_error]

Good — the validator caught the bad quarter value. But when this happens inside a LangGraph agent, you don’t want a crash. You want the agent to see the error and try again with corrected output.

That’s the self-correcting pattern we’ll build next.

Building a Self-Correcting Extraction Node

Here’s the core idea. You create a LangGraph node that attempts structured extraction, catches validation errors, feeds the error message back to the LLM, and lets it try again. A conditional edge loops until the output validates or you hit a retry limit.

The graph has three parts.

Extraction node: calls the LLM with your Pydantic schema. If extraction succeeds, it stores the result. If it fails, it stores the error message and increments a retry counter.

Routing function: checks the state. Did extraction succeed? Go to output. Did it fail but retries remain? Loop back. Out of retries? Go to fallback.

Fallback node: handles the case where the LLM can’t produce valid output after multiple attempts. Returns a partial result or error your application can handle gracefully.

python
class ExtractionState(TypedDict):
    messages: Annotated[list, operator.add]
    extraction_result: Optional[dict]
    validation_error: Optional[str]
    retry_count: int
    max_retries: int

The state tracks everything the self-correcting loop needs: the conversation history, the extraction result (if successful), the last validation error (if not), and counters for the retry logic.

Now the extraction node itself. When retrying, it appends the validation error to the conversation so the LLM knows what went wrong. This is the key difference between a useful retry and a random retry.

python
def extraction_node(state: ExtractionState):
    """Try to extract structured data, with error feedback on retries."""
    messages = state["messages"]
    retry_count = state.get("retry_count", 0)

    if state.get("validation_error"):
        error_msg = state["validation_error"]
        feedback = HumanMessage(
            content=(
                f"Your previous response failed validation: "
                f"{error_msg}\n\n"
                f"Please fix the errors and try again. "
                f"Return valid structured data."
            )
        )
        messages = messages + [feedback]

    try:
        structured_llm = llm.with_structured_output(
            FinancialReport
        )
        result = structured_llm.invoke(messages)
        return {
            "messages": [
                AIMessage(
                    content=f"Extracted: {result.model_dump_json()}"
                )
            ],
            "extraction_result": result.model_dump(),
            "validation_error": None,
            "retry_count": retry_count,
        }
    except Exception as e:
        return {
            "messages": [
                AIMessage(
                    content=f"Extraction failed: {str(e)}"
                )
            ],
            "extraction_result": None,
            "validation_error": str(e),
            "retry_count": retry_count + 1,
        }

print("Extraction node defined")
python
Extraction node defined

The node does two jobs. First, it calls the LLM with structured output enabled. Second, if a previous attempt failed, it includes the validation error so the LLM knows what to fix. That feedback loop is what makes self-correction actually work.

WARNING: Always set a retry limit. Without one, a self-correcting loop can run until LangGraph’s recursion limit kicks in with a cryptic error. Three retries is a practical default — beyond that, the schema or prompt probably needs redesigning.

Wiring the Self-Correcting Graph

The routing function checks three conditions: success, retries remaining, and retries exhausted. It returns one of three node names.

python
def route_extraction(
    state: ExtractionState,
) -> Literal["extract", "output", "fallback"]:
    """Route based on extraction success or failure."""
    if state.get("extraction_result") is not None:
        return "output"
    if state.get("retry_count", 0) >= state.get("max_retries", 3):
        return "fallback"
    return "extract"


def output_node(state: ExtractionState):
    """Process successful extraction."""
    result = state["extraction_result"]
    summary = (
        f"Successfully extracted: {result['company']} "
        f"{result['quarter']} {result['year']} — "
        f"Revenue: ${result['revenue_millions']}M, "
        f"Profit: ${result['profit_millions']}M"
    )
    return {"messages": [AIMessage(content=summary)]}


def fallback_node(state: ExtractionState):
    """Handle extraction failure after max retries."""
    retries = state.get("retry_count", 0)
    last_error = state.get("validation_error", "Unknown")
    return {
        "messages": [
            AIMessage(
                content=(
                    f"Extraction failed after {retries} attempts. "
                    f"Last error: {last_error}. "
                    f"Returning empty result."
                )
            )
        ],
        "extraction_result": {},
    }

print("Routing and handler nodes defined")
python
Routing and handler nodes defined

Now connect everything. The conditional edge after the extraction node is the critical piece — it’s what creates the retry loop.

python
graph_builder = StateGraph(ExtractionState)

graph_builder.add_node("extract", extraction_node)
graph_builder.add_node("output", output_node)
graph_builder.add_node("fallback", fallback_node)

graph_builder.add_edge(START, "extract")
graph_builder.add_conditional_edges(
    "extract",
    route_extraction,
    {
        "extract": "extract",
        "output": "output",
        "fallback": "fallback",
    },
)
graph_builder.add_edge("output", END)
graph_builder.add_edge("fallback", END)

extraction_graph = graph_builder.compile()
print("Self-correcting extraction graph compiled")
python
Self-correcting extraction graph compiled

Let’s test it with a real earnings snippet. The graph extracts structured financial data, validates it against our Pydantic model, and retries if anything fails.

python
test_input = {
    "messages": [
        SystemMessage(
            content="Extract financial data from the user's text."
        ),
        HumanMessage(
            content=(
                "Apple reported Q2 2025 earnings yesterday. "
                "Revenue came in at $94.8 billion for the quarter, "
                "with net profit of $24.2 billion."
            )
        ),
    ],
    "extraction_result": None,
    "validation_error": None,
    "retry_count": 0,
    "max_retries": 3,
}

result = extraction_graph.invoke(test_input)
print(f"\nFinal result: {result['extraction_result']}")
print(f"Retries used: {result['retry_count']}")

Here’s what you get:

python
Final result: {'company': 'Apple', 'quarter': 'Q2', 'year': 2025, 'revenue_millions': 94800.0, 'profit_millions': 24200.0}
Retries used: 0

Zero retries — the LLM got it right on the first attempt. But the safety net is there for when it doesn’t.

KEY INSIGHT: The self-correcting pattern has three ingredients: structured extraction, error feedback, and a conditional retry loop. Remove any one and it breaks — without extraction you get text, without feedback the LLM repeats mistakes, without the loop you crash on first failure.

Exercise 1: Add a Validator and Test the Retry Loop

You’ve seen how @field_validator catches bad data and how the extraction node feeds errors back. Now try it yourself.

Add a @field_validator to FinancialReport that rejects years before 2000 or after 2030. Then invoke the extraction graph with text that mentions a plausible but invalid year (like “fiscal year 1998”) to see the retry loop in action.

Hint 1

Add a new `@field_validator(“year”)` method that raises `ValueError` if the year is outside the 2000-2030 range.

Hint 2 (nearly the answer)
python
@field_validator("year")
@classmethod
def validate_year(cls, v):
    if v < 2000 or v > 2030:
        raise ValueError(f"Year must be 2000-2030, got {v}")
    return v

Then test with input text like: “Acme Corp reported Q3 1998 revenue of \(50 million with \)5 million profit.”

Solution
python
class FinancialReportStrict(BaseModel):
    """Extract quarterly financial data with strict year validation."""
    company: str = Field(description="Company name")
    quarter: str = Field(description="Quarter: Q1, Q2, Q3, Q4")
    year: int = Field(description="Fiscal year (2000-2030)")
    revenue_millions: float = Field(description="Revenue in millions USD")
    profit_millions: float = Field(description="Net profit in millions USD")

    @field_validator("quarter")
    @classmethod
    def validate_quarter(cls, v):
        valid = {"Q1", "Q2", "Q3", "Q4"}
        if v not in valid:
            raise ValueError(f"Quarter must be one of {valid}, got '{v}'")
        return v

    @field_validator("year")
    @classmethod
    def validate_year(cls, v):
        if v < 2000 or v > 2030:
            raise ValueError(f"Year must be 2000-2030, got {v}")
        return v

    @field_validator("revenue_millions")
    @classmethod
    def validate_revenue(cls, v):
        if v < 0:
            raise ValueError(f"Revenue cannot be negative, got {v}")
        return v

The retry loop catches the `ValueError`, feeds the error message back to the LLM, and the LLM adjusts. The validation error message tells the model exactly what range is acceptable.

Multiple Schemas and Dynamic Routing

What if your agent needs to extract different types of data depending on the input? A support ticket system might handle bug reports, feature requests, and billing inquiries — each with a different schema.

The pattern: classify first, then route to the right extractor. A quick classification call costs almost nothing and prevents expensive extraction failures.

python
class BugReport(BaseModel):
    """Extract bug report details."""
    title: str = Field(description="Short bug title")
    severity: str = Field(
        description="Bug severity: critical, high, medium, low"
    )
    steps_to_reproduce: list[str] = Field(
        description="Steps that trigger the bug"
    )
    expected_behavior: str = Field(
        description="What should happen instead"
    )

    @field_validator("severity")
    @classmethod
    def validate_severity(cls, v):
        valid = {"critical", "high", "medium", "low"}
        if v.lower() not in valid:
            raise ValueError(
                f"Severity must be one of {valid}, got '{v}'"
            )
        return v.lower()


class FeatureRequest(BaseModel):
    """Extract feature request details."""
    title: str = Field(description="Short feature title")
    priority: str = Field(
        description="Priority: must-have, nice-to-have, future"
    )
    use_case: str = Field(
        description="Why the user needs this feature"
    )
    suggested_solution: Optional[str] = Field(
        default=None,
        description="User's suggested implementation, if any"
    )

print("Support ticket schemas defined")
python
Support ticket schemas defined

The classifier node reads the ticket and decides which schema applies. This is a plain .invoke() call — no structured output needed for a simple classification.

python
class TicketState(TypedDict):
    messages: Annotated[list, operator.add]
    ticket_type: Optional[str]
    extraction_result: Optional[dict]
    validation_error: Optional[str]
    retry_count: int
    max_retries: int


def classify_ticket(state: TicketState):
    """Classify the ticket type before extraction."""
    messages = state["messages"]
    classifier_prompt = SystemMessage(
        content=(
            "Classify this support ticket as exactly one of: "
            "'bug_report' or 'feature_request'. "
            "Reply with just the classification, nothing else."
        )
    )
    response = llm.invoke([classifier_prompt] + messages)
    ticket_type = response.content.strip().lower()

    if "bug" in ticket_type:
        ticket_type = "bug_report"
    else:
        ticket_type = "feature_request"

    return {
        "messages": [
            AIMessage(content=f"Classified as: {ticket_type}")
        ],
        "ticket_type": ticket_type,
    }

print("Classifier node defined")
python
Classifier node defined

TIP: Always classify before extracting. Trying to extract with the wrong schema wastes tokens and retries. One cheap classification call prevents expensive downstream failures.

Each extraction node uses the appropriate schema and handles its own retry logic. The structure mirrors the single-schema pattern we built earlier.

python
def extract_bug_report(state: TicketState):
    """Extract structured bug report data."""
    messages = state["messages"]

    if state.get("validation_error"):
        messages = messages + [
            HumanMessage(
                content=f"Fix this error: {state['validation_error']}"
            )
        ]

    try:
        structured = llm.with_structured_output(BugReport)
        result = structured.invoke(messages)
        return {
            "messages": [
                AIMessage(content="Bug report extracted")
            ],
            "extraction_result": result.model_dump(),
            "validation_error": None,
            "retry_count": state.get("retry_count", 0),
        }
    except Exception as e:
        return {
            "messages": [
                AIMessage(content=f"Bug extraction failed: {e}")
            ],
            "extraction_result": None,
            "validation_error": str(e),
            "retry_count": state.get("retry_count", 0) + 1,
        }


def extract_feature_request(state: TicketState):
    """Extract structured feature request data."""
    messages = state["messages"]

    if state.get("validation_error"):
        messages = messages + [
            HumanMessage(
                content=f"Fix this error: {state['validation_error']}"
            )
        ]

    try:
        structured = llm.with_structured_output(FeatureRequest)
        result = structured.invoke(messages)
        return {
            "messages": [
                AIMessage(content="Feature request extracted")
            ],
            "extraction_result": result.model_dump(),
            "validation_error": None,
            "retry_count": state.get("retry_count", 0),
        }
    except Exception as e:
        return {
            "messages": [
                AIMessage(
                    content=f"Feature extraction failed: {e}"
                )
            ],
            "extraction_result": None,
            "validation_error": str(e),
            "retry_count": state.get("retry_count", 0) + 1,
        }

print("Schema-specific extraction nodes defined")
python
Schema-specific extraction nodes defined

Building the Multi-Schema Graph

Now wire the classifier and extractors into one graph. The conditional edges handle both initial routing (which schema?) and retry logic (did extraction succeed?).

python
def route_to_extractor(
    state: TicketState,
) -> Literal["extract_bug", "extract_feature"]:
    """Route to the correct extraction node."""
    if state.get("ticket_type") == "bug_report":
        return "extract_bug"
    return "extract_feature"


def route_after_extraction(
    state: TicketState,
) -> Literal["extract_bug", "extract_feature", "done"]:
    """Check extraction result and decide next step."""
    if state.get("extraction_result") is not None:
        return "done"
    if state.get("retry_count", 0) >= state.get("max_retries", 3):
        return "done"
    if state.get("ticket_type") == "bug_report":
        return "extract_bug"
    return "extract_feature"


def done_node(state: TicketState):
    """Final output node."""
    result = state.get("extraction_result", {})
    if result:
        return {
            "messages": [AIMessage(content=f"Final: {result}")],
        }
    return {
        "messages": [
            AIMessage(
                content="Extraction failed after all retries."
            )
        ],
    }

print("Routing functions and done node defined")
python
Routing functions and done node defined
python
ticket_graph_builder = StateGraph(TicketState)

ticket_graph_builder.add_node("classify", classify_ticket)
ticket_graph_builder.add_node("extract_bug", extract_bug_report)
ticket_graph_builder.add_node(
    "extract_feature", extract_feature_request
)
ticket_graph_builder.add_node("done", done_node)

ticket_graph_builder.add_edge(START, "classify")
ticket_graph_builder.add_conditional_edges(
    "classify",
    route_to_extractor,
    {
        "extract_bug": "extract_bug",
        "extract_feature": "extract_feature",
    },
)
ticket_graph_builder.add_conditional_edges(
    "extract_bug",
    route_after_extraction,
    {"extract_bug": "extract_bug", "done": "done"},
)
ticket_graph_builder.add_conditional_edges(
    "extract_feature",
    route_after_extraction,
    {"extract_feature": "extract_feature", "done": "done"},
)
ticket_graph_builder.add_edge("done", END)

ticket_graph = ticket_graph_builder.compile()
print("Multi-schema ticket graph compiled")
python
Multi-schema ticket graph compiled

Let’s test with a bug report.

python
bug_input = {
    "messages": [
        HumanMessage(
            content=(
                "The export button is broken. When I click it on "
                "the dashboard page, nothing happens. No file "
                "downloads. It used to work last week. I need "
                "this fixed ASAP because I can't generate "
                "monthly reports."
            )
        ),
    ],
    "ticket_type": None,
    "extraction_result": None,
    "validation_error": None,
    "retry_count": 0,
    "max_retries": 3,
}

result = ticket_graph.invoke(bug_input)
print(f"Type: {result['ticket_type']}")
print(f"Result: {result['extraction_result']}")
print(f"Retries: {result['retry_count']}")

You’ll see something like this:

python
Type: bug_report
Result: {'title': 'Export button not working on dashboard', 'severity': 'critical', 'steps_to_reproduce': ['Go to the dashboard page', 'Click the export button', 'Observe that nothing happens and no file downloads'], 'expected_behavior': 'A file should download containing the monthly report data'}
Retries: 0

The classifier identified it as a bug report, routed to the right extractor, and produced a validated BugReport object. The severity validator ensured “critical” is one of the allowed values.

Nested Pydantic Models for Complex Extraction

Real-world data isn’t flat. An invoice has line items. A research paper has multiple authors with affiliations. When you need hierarchical data, use nested Pydantic models.

The LLM reads the entire nested schema through the function-calling API and fills all levels in a single call. You don’t need separate extraction steps for each level.

python
class LineItem(BaseModel):
    """A single line item in an invoice."""
    description: str = Field(description="Item description")
    quantity: int = Field(description="Number of units")
    unit_price: float = Field(
        description="Price per unit in dollars"
    )
    total: float = Field(
        description="Line total (quantity * unit_price)"
    )

    @field_validator("quantity")
    @classmethod
    def validate_quantity(cls, v):
        if v <= 0:
            raise ValueError(
                f"Quantity must be positive, got {v}"
            )
        return v


class Invoice(BaseModel):
    """Extract structured invoice data."""
    vendor: str = Field(description="Vendor/seller name")
    invoice_number: str = Field(
        description="Invoice ID or number"
    )
    date: str = Field(
        description="Invoice date in YYYY-MM-DD format"
    )
    line_items: list[LineItem] = Field(
        description="Individual line items on the invoice"
    )
    subtotal: float = Field(description="Sum before tax")
    tax_rate: float = Field(
        description="Tax rate as a percentage"
    )
    total: float = Field(
        description="Final total including tax"
    )

    @field_validator("tax_rate")
    @classmethod
    def validate_tax_rate(cls, v):
        if v < 0 or v > 100:
            raise ValueError(
                f"Tax rate must be 0-100%, got {v}"
            )
        return v

print("Invoice schema with nested LineItem model defined")
python
Invoice schema with nested LineItem model defined

Let’s extract from a realistic invoice.

python
invoice_text = """
Invoice #INV-2025-0847
From: CloudStack Solutions
Date: March 5, 2025

Items:
- 3x GPU Instance (A100) at \(4.50/hr for 720 hours = \)9,720.00
- 1x Storage (5TB) at \(0.023/GB/mo = \)115.00
- 2x Load Balancer at \(25.00/mo = \)50.00

Subtotal: $9,885.00
Tax (8.5%): $840.23
Total: $10,725.23
"""

structured_llm = llm.with_structured_output(Invoice)
invoice = structured_llm.invoke(
    f"Extract invoice data:\n\n{invoice_text}"
)

print(f"Vendor: {invoice.vendor}")
print(f"Invoice #: {invoice.invoice_number}")
print(f"Items: {len(invoice.line_items)}")
for item in invoice.line_items:
    print(
        f"  - {item.description}: "
        f"{item.quantity} x \({item.unit_price} = \){item.total}"
    )
print(f"Total: ${invoice.total}")

The extraction produces:

python
Vendor: CloudStack Solutions
Invoice #: INV-2025-0847
Items: 3
  - GPU Instance (A100): 3 x \(4.5 = \)9720.0
  - Storage (5TB): 1 x \(0.023 = \)115.0
  - Load Balancer: 2 x \(25.0 = \)50.0
Total: $10725.23

Three nested line items, each with their own validated fields, all extracted in one call. The validators on LineItem and Invoice ensure data quality at every level.

KEY INSIGHT: Nested Pydantic models let you extract complex, hierarchical data in a single LLM call. Define the full schema and the model fills all levels at once — no need for multiple extraction passes.

Exercise 2: Build a Multi-Schema Extraction Pipeline

You’ve seen the classify-then-extract pattern. Now build one yourself.

Create two Pydantic schemas: MeetingNotes (with fields: title, date, attendees list, action_items list, decisions list) and StatusUpdate (with fields: project_name, status as “on_track”/”at_risk”/”blocked”, completed_tasks list, blockers list). Add a validator that rejects invalid status values.

Then build a LangGraph pipeline that classifies incoming text as either “meeting_notes” or “status_update” and routes to the correct extractor.

Hint 1

Follow the same pattern as the ticket system: create a `classify_node` that uses plain LLM invoke, then a routing function that sends to either `extract_meeting` or `extract_status` based on classification.

Hint 2 (nearly the answer)
python
class MeetingNotes(BaseModel):
    title: str = Field(description="Meeting title or topic")
    date: str = Field(description="Meeting date in YYYY-MM-DD format")
    attendees: list[str] = Field(description="List of attendee names")
    action_items: list[str] = Field(description="Action items assigned")
    decisions: list[str] = Field(description="Key decisions made")

class StatusUpdate(BaseModel):
    project_name: str = Field(description="Project name")
    status: str = Field(description="Status: on_track, at_risk, or blocked")
    completed_tasks: list[str] = Field(description="Tasks completed this period")
    blockers: list[str] = Field(description="Current blockers, if any")

    @field_validator("status")
    @classmethod
    def validate_status(cls, v):
        valid = {"on_track", "at_risk", "blocked"}
        if v.lower() not in valid:
            raise ValueError(f"Status must be one of {valid}")
        return v.lower()

Wire the graph exactly like the ticket system — classify, route, extract, done.

Solution
python
# Full solution follows the TicketState pattern:
# 1. Define MeetingNotes and StatusUpdate schemas (above)
# 2. Create DocState TypedDict with doc_type, extraction_result, etc.
# 3. Build classify_doc, extract_meeting, extract_status nodes
# 4. Wire with conditional edges: classify -> route -> extract -> done
# The key insight: the classify step prevents wrong-schema extraction

Common Mistakes and How to Fix Them

Mistake 1: Missing field descriptions

Wrong:

python
class UserProfile(BaseModel):
    name: str
    age: int
    city: str

Why it’s wrong: Without Field(description=...), the LLM has no guidance. “city” could mean birth city, current city, or favorite city. You’ll get inconsistent results.

Correct:

python
class UserProfile(BaseModel):
    name: str = Field(description="User's full legal name")
    age: int = Field(description="User's current age in years")
    city: str = Field(
        description="User's current city of residence"
    )

Mistake 2: No retry limit on self-correcting loops

Wrong:

python
def route(state):
    if state["extraction_result"]:
        return "done"
    return "extract"  # Loops forever on persistent errors!

Why it’s wrong: If the LLM consistently can’t produce valid output, this runs until LangGraph’s recursion limit triggers a cryptic error.

Correct:

python
def route(state):
    if state["extraction_result"]:
        return "done"
    if state["retry_count"] >= state["max_retries"]:
        return "fallback"
    return "extract"

Mistake 3: Not feeding errors back to the LLM

Wrong:

python
except Exception as e:
    return {"retry_count": state["retry_count"] + 1}

Why it’s wrong: The LLM doesn’t know what failed, so it’ll likely make the same mistake. The retry loop becomes random guessing.

Correct:

python
except Exception as e:
    return {
        "validation_error": str(e),
        "retry_count": state["retry_count"] + 1,
    }

Then include the error in the next prompt so the LLM fixes the specific issue.

Advanced: Confidence Scoring and Conditional Review

Sometimes pass/fail validation isn’t enough. You want the LLM to rate its own confidence, and only flag low-confidence results for human review.

Add a confidence field to your schema. After extraction, route high-confidence results to output and low-confidence results to a review queue.

python
class ExtractedEntity(BaseModel):
    """Extract a named entity with confidence scoring."""
    entity_name: str = Field(
        description="The extracted entity"
    )
    entity_type: str = Field(
        description="Type: person, organization, location, event"
    )
    confidence: float = Field(
        description=(
            "Your confidence in this extraction, 0.0 to 1.0. "
            "Use 0.9+ when text is explicit. "
            "Use 0.5-0.8 when inferring. "
            "Use below 0.5 when guessing."
        )
    )
    evidence: str = Field(
        description="Quote from text supporting this extraction"
    )

    @field_validator("confidence")
    @classmethod
    def validate_confidence(cls, v):
        if v < 0.0 or v > 1.0:
            raise ValueError(
                f"Confidence must be 0.0-1.0, got {v}"
            )
        return round(v, 2)

    @field_validator("entity_type")
    @classmethod
    def validate_entity_type(cls, v):
        valid = {"person", "organization", "location", "event"}
        if v.lower() not in valid:
            raise ValueError(
                f"entity_type must be one of {valid}"
            )
        return v.lower()


structured_llm = llm.with_structured_output(ExtractedEntity)

clear_result = structured_llm.invoke(
    "Extract the main entity: "
    "Microsoft CEO Satya Nadella announced the partnership."
)
print(f"Entity: {clear_result.entity_name}")
print(f"Type: {clear_result.entity_type}")
print(f"Confidence: {clear_result.confidence}")
print(f"Evidence: {clear_result.evidence}")

The LLM reports high confidence for explicit mentions:

python
Entity: Satya Nadella
Type: person
Confidence: 0.95
Evidence: Microsoft CEO Satya Nadella announced the partnership.

You can route results below 0.7 to human review and let high-confidence results flow through automatically.

TIP: Calibrate confidence thresholds on real data. LLMs tend to be overconfident. Test with 50+ examples to find the threshold where reported confidence correlates with actual accuracy.

When NOT to Use Structured Output

Structured output is powerful, but it’s not always the right choice. Here are scenarios where plain text works better.

Creative tasks. Poems, brainstorms, email drafts — forcing structured output constrains creativity. Let the LLM write freely.

Simple yes/no questions. A Pydantic model with one boolean field is overkill. Use basic prompt engineering.

Highly ambiguous source text. When the text is too vague to reliably fill schema fields, you’ll burn through retries without improvement. Have the LLM summarize in plain text instead.

Schemas with 20+ fields. Very large schemas push the limits of what LLMs can fill accurately in one call. Break them into smaller models and extract in stages.

WARNING: Don’t use structured output as a crutch for unclear requirements. If you can’t define the schema clearly, the LLM can’t fill it clearly either. Clarify what you need before writing the Pydantic model.

Putting It All Together: Production Pipeline

Let’s combine everything into a complete self-correcting extraction pipeline. This one takes job postings, extracts structured data with nested models and validators, and retries on failure.

The JobPosting schema uses a nested Salary model. The @field_validator on required_skills enforces a maximum of 10 items. This is the kind of business rule that type checking alone can’t catch.

python
class Salary(BaseModel):
    """Salary information."""
    min_amount: Optional[int] = Field(
        default=None,
        description="Minimum salary in USD per year"
    )
    max_amount: Optional[int] = Field(
        default=None,
        description="Maximum salary in USD per year"
    )
    currency: str = Field(
        default="USD", description="Currency code"
    )


class JobPosting(BaseModel):
    """Extract structured data from a job posting."""
    title: str = Field(description="Job title")
    company: str = Field(description="Hiring company name")
    location: str = Field(
        description="Job location or 'Remote'"
    )
    experience_years: Optional[int] = Field(
        default=None,
        description="Required years of experience"
    )
    salary: Optional[Salary] = Field(
        default=None,
        description="Salary range if mentioned"
    )
    required_skills: list[str] = Field(
        description="Required technical skills, max 10"
    )
    is_remote: bool = Field(
        description="True if position allows remote work"
    )

    @field_validator("required_skills")
    @classmethod
    def validate_skills(cls, v):
        if len(v) > 10:
            raise ValueError(
                f"Max 10 required skills, got {len(v)}"
            )
        return v

print("JobPosting schema with nested Salary defined")
python
JobPosting schema with nested Salary defined

The full pipeline graph follows the same pattern: extract, route, retry or succeed.

python
class JobExtractionState(TypedDict):
    messages: Annotated[list, operator.add]
    extraction_result: Optional[dict]
    validation_error: Optional[str]
    retry_count: int
    max_retries: int


def extract_job(state: JobExtractionState):
    """Extract job posting data with self-correction."""
    messages = state["messages"]
    retry_count = state.get("retry_count", 0)

    if state.get("validation_error"):
        messages = messages + [
            HumanMessage(
                content=(
                    f"Previous attempt failed: "
                    f"{state['validation_error']}\n"
                    f"Fix the errors and try again."
                )
            )
        ]

    try:
        structured = llm.with_structured_output(JobPosting)
        result = structured.invoke(messages)
        return {
            "messages": [
                AIMessage(
                    content="Job posting extracted successfully"
                )
            ],
            "extraction_result": result.model_dump(),
            "validation_error": None,
            "retry_count": retry_count,
        }
    except Exception as e:
        return {
            "messages": [
                AIMessage(content=f"Extraction error: {e}")
            ],
            "extraction_result": None,
            "validation_error": str(e),
            "retry_count": retry_count + 1,
        }


def route_job_extraction(
    state: JobExtractionState,
) -> Literal["extract", "success", "failure"]:
    if state.get("extraction_result") is not None:
        return "success"
    if state.get("retry_count", 0) >= state.get("max_retries", 3):
        return "failure"
    return "extract"


def success_node(state: JobExtractionState):
    result = state["extraction_result"]
    return {
        "messages": [
            AIMessage(
                content=(
                    f"Extracted: {result['title']} "
                    f"at {result['company']}"
                )
            )
        ],
    }


def failure_node(state: JobExtractionState):
    return {
        "messages": [
            AIMessage(
                content="Could not extract job posting data."
            )
        ],
    }


job_graph_builder = StateGraph(JobExtractionState)
job_graph_builder.add_node("extract", extract_job)
job_graph_builder.add_node("success", success_node)
job_graph_builder.add_node("failure", failure_node)

job_graph_builder.add_edge(START, "extract")
job_graph_builder.add_conditional_edges(
    "extract",
    route_job_extraction,
    {
        "extract": "extract",
        "success": "success",
        "failure": "failure",
    },
)
job_graph_builder.add_edge("success", END)
job_graph_builder.add_edge("failure", END)

job_graph = job_graph_builder.compile()
print("Job extraction pipeline compiled")
python
Job extraction pipeline compiled
python
job_text = """
Senior ML Engineer — DataFlow Inc.
Location: San Francisco, CA (Hybrid — 3 days in office)
Salary: \(180,000 - \)250,000/year

We're looking for an ML engineer with 5+ years of experience
to join our platform team. You'll build and deploy machine
learning pipelines at scale.

Requirements:
- Python, PyTorch, and TensorFlow
- Experience with Kubernetes and Docker
- Strong understanding of MLOps (MLflow, Kubeflow)
- SQL and data pipeline experience
- Familiarity with cloud platforms (AWS/GCP)
"""

job_result = job_graph.invoke({
    "messages": [
        SystemMessage(
            content="Extract structured data from this job posting."
        ),
        HumanMessage(content=job_text),
    ],
    "extraction_result": None,
    "validation_error": None,
    "retry_count": 0,
    "max_retries": 3,
})

extracted = job_result["extraction_result"]
print(f"Title: {extracted['title']}")
print(f"Company: {extracted['company']}")
print(f"Location: {extracted['location']}")
print(f"Remote: {extracted['is_remote']}")
print(f"Experience: {extracted['experience_years']} years")
if extracted.get("salary"):
    sal = extracted["salary"]
    print(f"Salary: \({sal['min_amount']:,} - \){sal['max_amount']:,}")
print(f"Skills: {', '.join(extracted['required_skills'])}")
print(f"Retries: {job_result['retry_count']}")

Here’s what the pipeline extracts:

python
Title: Senior ML Engineer
Company: DataFlow Inc.
Location: San Francisco, CA
Remote: False
Experience: 5 years
Salary: \(180,000 - \)250,000
Skills: Python, PyTorch, TensorFlow, Kubernetes, Docker, MLOps, MLflow, Kubeflow, SQL, AWS/GCP
Retries: 0

The pipeline handled a nested salary model, validated skill count, correctly identified hybrid (not remote), and extracted all required fields. The self-correcting loop sat ready but wasn’t needed — well-designed schemas with clear descriptions often succeed on the first attempt.

Complete Code

Click to expand the full script (copy-paste and run)
python
# Complete code from: Structured Output and Self-Correcting Agents in LangGraph
# Requires: pip install langchain-openai langgraph pydantic python-dotenv
# Python 3.10+
# Set OPENAI_API_KEY in your .env file

import os
import operator
from typing import TypedDict, Annotated, Optional, Literal
from dotenv import load_dotenv
from pydantic import BaseModel, Field, field_validator
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langgraph.graph import StateGraph, MessagesState, START, END

load_dotenv()

llm = ChatOpenAI(model="gpt-4o-mini")

# --- Schema: Financial Report ---

class FinancialReport(BaseModel):
    """Extract quarterly financial data from earnings text."""
    company: str = Field(description="Company name")
    quarter: str = Field(description="Quarter, e.g. Q1, Q2, Q3, Q4")
    year: int = Field(description="Fiscal year")
    revenue_millions: float = Field(description="Revenue in millions of dollars")
    profit_millions: float = Field(description="Net profit in millions of dollars")

    @field_validator("quarter")
    @classmethod
    def validate_quarter(cls, v):
        valid = {"Q1", "Q2", "Q3", "Q4"}
        if v not in valid:
            raise ValueError(f"Quarter must be one of {valid}, got '{v}'")
        return v

    @field_validator("revenue_millions")
    @classmethod
    def validate_revenue(cls, v):
        if v < 0:
            raise ValueError(f"Revenue cannot be negative, got {v}")
        return v


# --- Self-Correcting Extraction Graph ---

class ExtractionState(TypedDict):
    messages: Annotated[list, operator.add]
    extraction_result: Optional[dict]
    validation_error: Optional[str]
    retry_count: int
    max_retries: int


def extraction_node(state: ExtractionState):
    messages = state["messages"]
    retry_count = state.get("retry_count", 0)

    if state.get("validation_error"):
        feedback = HumanMessage(
            content=(
                f"Your previous response failed validation: "
                f"{state['validation_error']}\n\n"
                f"Please fix the errors and try again."
            )
        )
        messages = messages + [feedback]

    try:
        structured_llm = llm.with_structured_output(FinancialReport)
        result = structured_llm.invoke(messages)
        return {
            "messages": [AIMessage(content=f"Extracted: {result.model_dump_json()}")],
            "extraction_result": result.model_dump(),
            "validation_error": None,
            "retry_count": retry_count,
        }
    except Exception as e:
        return {
            "messages": [AIMessage(content=f"Extraction failed: {str(e)}")],
            "extraction_result": None,
            "validation_error": str(e),
            "retry_count": retry_count + 1,
        }


def route_extraction(state: ExtractionState) -> Literal["extract", "output", "fallback"]:
    if state.get("extraction_result") is not None:
        return "output"
    if state.get("retry_count", 0) >= state.get("max_retries", 3):
        return "fallback"
    return "extract"


def output_node(state: ExtractionState):
    result = state["extraction_result"]
    summary = (
        f"Successfully extracted: {result['company']} "
        f"{result['quarter']} {result['year']} — "
        f"Revenue: ${result['revenue_millions']}M, "
        f"Profit: ${result['profit_millions']}M"
    )
    return {"messages": [AIMessage(content=summary)]}


def fallback_node(state: ExtractionState):
    retries = state.get("retry_count", 0)
    last_error = state.get("validation_error", "Unknown")
    return {
        "messages": [
            AIMessage(content=f"Extraction failed after {retries} attempts. Last error: {last_error}.")
        ],
        "extraction_result": {},
    }


graph_builder = StateGraph(ExtractionState)
graph_builder.add_node("extract", extraction_node)
graph_builder.add_node("output", output_node)
graph_builder.add_node("fallback", fallback_node)
graph_builder.add_edge(START, "extract")
graph_builder.add_conditional_edges(
    "extract", route_extraction,
    {"extract": "extract", "output": "output", "fallback": "fallback"},
)
graph_builder.add_edge("output", END)
graph_builder.add_edge("fallback", END)

extraction_graph = graph_builder.compile()

# --- Test ---
test_input = {
    "messages": [
        SystemMessage(content="Extract financial data from the user's text."),
        HumanMessage(
            content="Apple reported Q2 2025 earnings yesterday. Revenue came in at \(94.8 billion for the quarter, with net profit of \)24.2 billion."
        ),
    ],
    "extraction_result": None,
    "validation_error": None,
    "retry_count": 0,
    "max_retries": 3,
}

result = extraction_graph.invoke(test_input)
print(f"Final result: {result['extraction_result']}")
print(f"Retries used: {result['retry_count']}")

print("\nScript completed successfully.")

Frequently Asked Questions

Can I use structured output without LangGraph?

Yes. LangChain’s .with_structured_output() works with any chat model that supports function calling — no graph needed. LangGraph’s value is the retry loop. If you don’t need automatic self-correction on validation failure, llm.with_structured_output(MyModel).invoke(prompt) is simpler and fine.

Does structured output work with open-source models?

It depends on function-calling support. Models like Llama 3 and Mistral support tool use, so .with_structured_output() works. Smaller models without function-calling capability need PydanticOutputParser instead.

python
from langchain.output_parsers import PydanticOutputParser

parser = PydanticOutputParser(pydantic_object=CompanyInfo)
format_instructions = parser.get_format_instructions()
# Add format_instructions to your prompt text

This injects format instructions into the prompt and parses the text response. It’s less reliable but works with any model.

How do I handle optional fields the LLM skips?

Use Optional[type] = Field(default=None, ...) in your Pydantic model. This tells both Pydantic and the LLM that the field can be null. Without Optional, a missing field triggers a ValidationError and wastes a retry cycle when the data genuinely isn’t in the source text.

What’s the performance cost of self-correcting loops?

Each retry is a full LLM call. With GPT-4o-mini, that’s roughly 0.5-1 second and a few cents per retry. Set max_retries=3 as a sensible default. Well-designed schemas with clear field descriptions rarely need more than one retry. If you’re hitting the limit regularly, your schema needs work — not more retries.

Can I use structured output at the graph level instead of per-node?

Not directly. LangGraph’s StateGraph doesn’t have a built-in structured output mode. You apply .with_structured_output() to the LLM inside individual nodes. If you need the entire graph’s final output to match a schema, validate it in the last node before returning.

Summary

Structured output transforms LLM responses from unpredictable text into validated, typed data your application can use directly. The building blocks: Pydantic models define the schema, .with_structured_output() handles the LLM integration, and LangGraph’s conditional edges create self-correcting retry loops.

Start with .with_structured_output() for simple cases. When you need validation beyond type checking, add @field_validator decorators. When you need automatic retry on failure, wrap extraction in a LangGraph node with a conditional edge that loops back on error.

The pattern scales from simple flat schemas to nested models with multiple extractors. Classify first, route to the right schema, extract with validation, and retry with error feedback. That’s the complete toolkit.

Practice exercise: Build a self-correcting extraction agent that takes restaurant reviews and outputs structured data: restaurant name, cuisine type, rating (1-5, validated), price range, and mentioned dishes. Include a @field_validator for ratings and a retry loop with three attempts.

Click to see the solution
python
class RestaurantReview(BaseModel):
    """Extract structured restaurant review data."""
    restaurant_name: str = Field(
        description="Name of the restaurant"
    )
    cuisine: str = Field(description="Type of cuisine")
    rating: int = Field(description="Rating from 1 to 5")
    price_range: str = Field(
        description="Price range: budget, moderate, upscale, fine-dining"
    )
    dishes_mentioned: list[str] = Field(
        description="Specific dishes mentioned in the review"
    )

    @field_validator("rating")
    @classmethod
    def validate_rating(cls, v):
        if v < 1 or v > 5:
            raise ValueError(f"Rating must be 1-5, got {v}")
        return v

    @field_validator("price_range")
    @classmethod
    def validate_price_range(cls, v):
        valid = {"budget", "moderate", "upscale", "fine-dining"}
        if v.lower() not in valid:
            raise ValueError(f"Must be one of {valid}, got '{v}'")
        return v.lower()

# Build the graph using ExtractionState and the same
# extract -> route -> output/fallback pattern from the article.
# Replace FinancialReport with RestaurantReview in the extraction node.

References

  1. LangChain documentation — Structured output. Link
  2. LangGraph documentation — Extraction with retries. Link
  3. Pydantic documentation — Validators. Link
  4. LangGraph documentation — Conditional edges. Link
  5. OpenAI documentation — Function calling. Link
  6. LangChain documentation — PydanticOutputParser. Link
  7. LangGraph documentation — StateGraph. Link
  8. Pydantic documentation — Field types and constraints. Link
Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Get the full course,
completely free.
Join 57,000+ students learning Python, SQL & ML. One year of access, all resources included.
📚 10 Courses
🐍 Python & ML
🗄️ SQL
📦 Downloads
📅 1 Year Access
No thanks
🎓
Free AI/ML Starter Kit
Python · SQL · ML · 10 Courses · 57,000+ students
🎉   You're in! Check your inbox (or Promotions/Spam) for the access link.
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science