LangGraph SQL Agent: Answer Business Questions

Step-by-step guide to building a LangGraph SQL agent that turns plain English into SQL, runs queries, retries on errors, and explains results clearly.

Written by Selva Prabhakaran | 37 min read

Build a LangGraph agent that takes plain English questions, writes SQL, queries a real database, fixes its own errors, and delivers clean summaries.

Your product manager pings you on Slack: “What were our top 5 products by revenue last quarter?” You could open a query editor and write the SQL yourself. Or you could build an agent that does it for you — every time, for any question, hands-free.

That is what this project is about. Not a toy demo that runs one canned query. We are building a real agent that knows the database schema, writes SQL, checks its own work, retries when things go wrong, and explains the results in plain English. When you are done, you will have a system you can aim at any SQLite database.

Let me show you how the pieces fit together before we write any code.

A user types a business question in plain English. The agent’s first move is to look at the database schema — it needs to know what tables and columns exist before it can write anything. With that context in hand, it writes a SQL query. But LLM-generated SQL is not always right, so the agent checks it for syntax errors first. If the query looks good, it runs. If it is broken, the agent rewrites it and tries again — up to three times.

Once the results come back, the agent does not dump raw rows on the user. Instead, it writes a short summary that answers the original question in plain language.

That gives us five stages: schema lookup, query writing, checking, running (with retry), and summarizing. Each stage becomes a node in our LangGraph graph. The retry loop is a conditional edge that sends the work back to the query writer when something fails. We will build every piece from scratch.

What Sets a SQL Agent Apart from Basic Text-to-SQL?

Basic text-to-SQL is a one-shot deal. You send a question to the LLM, it spits out SQL, you run it. If the SQL is wrong, you are stuck. No retries, no schema awareness, no way to recover.

A SQL agent works in a very different way. It runs as a loop that reasons across multiple steps. The agent reads the database first, writes a query with full schema context, double-checks its own work, and fixes mistakes on its own. This is not just more code — it is a different way of thinking about the problem.

Here is what our agent can do that a one-shot approach cannot:

Schema discovery — the agent reads table names, column types, and sample rows before writing any SQL
Self-fixing — when a query fails, the agent reads the error and rewrites the query itself
Multi-step reasoning — hard questions might need the agent to explore the schema, run a test query, then run the real one
Plain English output — results come back as clear answers, not raw database rows

Key Insight: A SQL agent is not just a fancier prompt — it is a feedback loop. The agent writes SQL, sees what happens, and adjusts. That loop is what makes it safe enough for business users who cannot debug SQL on their own.

Prerequisites

Python version: 3.10+
Required libraries: langgraph (0.4+), langchain-openai (0.3+), langchain-core (0.3+), langchain-community (0.3+)
Install: pip install langgraph langchain-openai langchain-core langchain-community
API key: An OpenAI API key set as OPENAI_API_KEY. See OpenAI’s docs to create one.
Database: We will create a SQLite database from scratch — nothing to download.
Time to complete: ~45 minutes
Prior knowledge: LangGraph fundamentals (nodes, edges, state, tool calling) from earlier posts in this series.

Step 1 — How Do You Set Up the Database?

Every SQL agent needs data to work with. We will build a small but realistic e-commerce database with four tables: customers, products, orders, and order_items. That is enough to ask interesting business questions without getting lost in complexity.

We are using SQLite, which comes built into Python — no extra install needed. The sample data is fixed (not random), so your output will match ours exactly.

python

import os
import sqlite3
from datetime import datetime

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langgraph.graph import StateGraph, START, END

load_dotenv()

# Create an in-memory SQLite database
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

print("Database connection established")

python

Database connection established

Now let’s create the four tables and fill them with data. The schema looks like a real online store: customers place orders, each order has line items, and each line item points to a product.

python

# Create tables
cursor.executescript("""
CREATE TABLE customers (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL,
    city TEXT NOT NULL,
    signup_date TEXT NOT NULL
);

CREATE TABLE products (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    category TEXT NOT NULL,
    price REAL NOT NULL
);

CREATE TABLE orders (
    id INTEGER PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    order_date TEXT NOT NULL,
    status TEXT NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);

CREATE TABLE order_items (
    id INTEGER PRIMARY KEY,
    order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL,
    quantity INTEGER NOT NULL,
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);
""")

print("Tables created: customers, products, orders, order_items")

python

Tables created: customers, products, orders, order_items

Time to add sample records. We put in 8 customers, 10 products, 12 orders, and 20 line items. The order dates span Q3 and Q4 of 2025, which lets us ask quarterly comparison questions later.

python

# Insert customers
cursor.executemany(
    "INSERT INTO customers VALUES (?, ?, ?, ?, ?)",
    [
        (1, "Alice Johnson", "alice@example.com", "New York", "2024-01-15"),
        (2, "Bob Smith", "bob@example.com", "Chicago", "2024-03-22"),
        (3, "Carol Davis", "carol@example.com", "New York", "2024-06-10"),
        (4, "Dan Wilson", "dan@example.com", "Austin", "2024-02-28"),
        (5, "Eva Martinez", "eva@example.com", "Chicago", "2024-07-04"),
        (6, "Frank Lee", "frank@example.com", "Austin", "2024-09-15"),
        (7, "Grace Kim", "grace@example.com", "New York", "2025-01-10"),
        (8, "Henry Brown", "henry@example.com", "Chicago", "2025-04-20"),
    ],
)

# Insert products
cursor.executemany(
    "INSERT INTO products VALUES (?, ?, ?, ?)",
    [
        (1, "Laptop Pro", "Electronics", 1299.99),
        (2, "Wireless Mouse", "Electronics", 29.99),
        (3, "Python Cookbook", "Books", 49.99),
        (4, "Standing Desk", "Furniture", 599.99),
        (5, "Monitor 27in", "Electronics", 349.99),
        (6, "Keyboard Mech", "Electronics", 89.99),
        (7, "Data Science Handbook", "Books", 39.99),
        (8, "Desk Lamp", "Furniture", 45.99),
        (9, "USB-C Hub", "Electronics", 59.99),
        (10, "Webcam HD", "Electronics", 79.99),
    ],
)

conn.commit()
print(f"Inserted {cursor.execute('SELECT COUNT(*) FROM customers').fetchone()[0]} customers")
print(f"Inserted {cursor.execute('SELECT COUNT(*) FROM products').fetchone()[0]} products")

python

Inserted 8 customers
Inserted 10 products

Next come the orders and line items. Each order belongs to a customer, and each line item ties an order to a product with a quantity.

python

# Insert orders (spanning Q3-Q4 2025)
cursor.executemany(
    "INSERT INTO orders VALUES (?, ?, ?, ?)",
    [
        (1, 1, "2025-07-10", "completed"),
        (2, 2, "2025-07-18", "completed"),
        (3, 3, "2025-08-05", "completed"),
        (4, 1, "2025-08-22", "completed"),
        (5, 4, "2025-09-03", "completed"),
        (6, 5, "2025-09-15", "completed"),
        (7, 2, "2025-10-01", "completed"),
        (8, 6, "2025-10-12", "completed"),
        (9, 3, "2025-11-05", "completed"),
        (10, 7, "2025-11-20", "completed"),
        (11, 1, "2025-12-01", "completed"),
        (12, 8, "2025-12-15", "cancelled"),
    ],
)

# Insert order items
cursor.executemany(
    "INSERT INTO order_items VALUES (?, ?, ?, ?)",
    [
        (1, 1, 1, 1),   (2, 1, 2, 2),
        (3, 2, 3, 1),   (4, 2, 6, 1),
        (5, 3, 5, 1),   (6, 3, 9, 2),
        (7, 4, 2, 3),   (8, 4, 7, 1),
        (9, 5, 4, 1),   (10, 5, 8, 2),
        (11, 6, 1, 1),  (12, 6, 6, 1),
        (13, 7, 3, 2),  (14, 7, 10, 1),
        (15, 8, 5, 1),  (16, 8, 2, 1),
        (17, 9, 1, 1),  (18, 9, 4, 1),
        (19, 10, 7, 2), (20, 10, 9, 1),
    ],
)

conn.commit()
print(f"Inserted {cursor.execute('SELECT COUNT(*) FROM orders').fetchone()[0]} orders")
print(f"Inserted {cursor.execute('SELECT COUNT(*) FROM order_items').fetchone()[0]} order items")

python

Inserted 12 orders
Inserted 20 order items

Let’s do a quick sanity check. This join query finds the top products by total revenue.

python

result = cursor.execute("""
    SELECT p.name, SUM(p.price * oi.quantity) as revenue
    FROM order_items oi
    JOIN products p ON oi.product_id = p.id
    JOIN orders o ON oi.order_id = o.id
    WHERE o.status = 'completed'
    GROUP BY p.name
    ORDER BY revenue DESC
    LIMIT 5
""").fetchall()

for name, revenue in result:
    print(f"{name}: ${revenue:,.2f}")

python

Laptop Pro: $2,599.98
Standing Desk: $1,199.98
Monitor 27in: $699.98
Wireless Mouse: $149.95
Python Cookbook: $149.97

Looks right. Laptop Pro leads because it costs the most and was ordered twice. This is the database our agent will query for the rest of the tutorial.

Step 2 — How Do You Define the Agent State?

The state holds everything that moves between nodes. For a SQL agent, we need more than just messages. We need slots for the schema, the SQL the LLM wrote, the query results, any error that came up, and a retry counter.

LangGraph uses TypedDict for state. Every node can read every field, and each node returns a dict with the fields it wants to update.

python

from typing import TypedDict, Optional

class SQLAgentState(TypedDict):
    question: str
    schema_info: str
    generated_sql: str
    sql_valid: bool
    query_result: str
    error_message: str
    answer: str
    retry_count: int

Eight fields total. Here is what each one does:

question — the user’s question in plain English
schema_info — the database layout as a string (tables, columns, types, sample rows)
generated_sql — the SQL query the LLM writes
sql_valid — did the query pass the syntax check?
query_result — the raw output from running the query
error_message — what went wrong, if anything (empty string when all is well)
answer — the final plain-English answer for the user
retry_count — how many times we have looped back to rewrite the query (tops out at 3)

I keep the state flat on purpose. No nested dicts, no lists of halfway results. Each node reads what it needs and writes what it makes. Clean and easy to debug.

Tip: Why not use `MessagesState` here? A ReAct agent treats messages as the state — the chat IS the computation. Our SQL agent is different. It has clear, separate stages with typed data flowing between them. A custom `TypedDict` makes that data flow obvious and much easier to inspect when things go sideways.

Step 3 — How Does the Schema Inspector Node Work?

The first node reads the database layout. The LLM cannot write good SQL if it does not know what tables and columns exist. This node asks SQLite for its metadata and builds a formatted string with table names, column types, and a few sample rows.

Why show sample rows? Because column names by themselves can be vague. A column called status could hold anything — “active” vs. “inactive”, “pending” vs. “shipped”, or even numeric codes. Showing 3 real rows clears up the guessing.

python

def get_schema_node(state: SQLAgentState) -> dict:
    """Read database schema and sample data."""
    schema_parts = []

    tables = cursor.execute(
        "SELECT name FROM sqlite_master WHERE type='table'"
    ).fetchall()

    for (table_name,) in tables:
        # Get column info
        columns = cursor.execute(
            f"PRAGMA table_info({table_name})"
        ).fetchall()
        col_defs = [f"  {c[1]} {c[2]}" for c in columns]

        # Get 3 sample rows
        samples = cursor.execute(
            f"SELECT * FROM {table_name} LIMIT 3"
        ).fetchall()

        schema_parts.append(
            f"TABLE: {table_name}\n"
            f"COLUMNS:\n" + "\n".join(col_defs) + "\n"
            f"SAMPLE ROWS: {samples}"
        )

    schema_info = "\n\n".join(schema_parts)
    return {"schema_info": schema_info}

No LLM call here — just pure Python. The node reads from SQLite’s sqlite_master table and PRAGMA table_info. The output is a string that the LLM will use as context in the next step.

Let’s test this on its own to see what the LLM will get to work with.

python

test_schema = get_schema_node({"question": "", "schema_info": "", "generated_sql": "", "sql_valid": False, "query_result": "", "error_message": "", "answer": "", "retry_count": 0})
print(test_schema["schema_info"][:500])

python

TABLE: customers
COLUMNS:
  id INTEGER
  name TEXT
  email TEXT
  city TEXT
  signup_date TEXT
SAMPLE ROWS: [(1, 'Alice Johnson', 'alice@example.com', 'New York', '2024-01-15'), (2, 'Bob Smith', 'bob@example.com', 'Chicago', '2024-03-22'), (3, 'Carol Davis', 'carol@example.com', 'New York', '2024-06-10')]

TABLE: products
COLUMNS:
  id INTEGER
  name TEXT
  category TEXT
  price REAL
SAMPLE ROWS: [(1, 'Laptop Pro', 'Electronics', 1299.99), (2, 'Wireless

That gives the LLM a clear picture: table names, column types, and real data. Much better than just a list of table names.

Step 4 — How Does the Query Generator Work?

This is where the LLM earns its keep. The generator takes the user’s question plus the database schema and writes a SQL query. The system prompt is the key ingredient — it tells the LLM exactly what rules to follow.

From my experience, being very specific in the prompt stops most SQL errors before they start. Saying “write valid SQLite” is not enough. You need to spell out things like “use single quotes for strings,” “do not use ILIKE because SQLite lacks it,” and “always give tables short aliases in JOINs.”

python

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def generate_query_node(state: SQLAgentState) -> dict:
    """Generate a SQL query from the user's question."""
    system_prompt = f"""You are a SQL expert. Generate a SQLite query to answer the user's question.

DATABASE SCHEMA:
{state['schema_info']}

RULES:
- Return ONLY the SQL query, no markdown, no explanation
- Use SQLite syntax (no ILIKE, use LIKE with LOWER() instead)
- Always use table aliases in JOINs
- Use single quotes for string literals
- If the question is ambiguous, make reasonable assumptions
- For date comparisons, dates are stored as TEXT in 'YYYY-MM-DD' format
"""

    if state.get("error_message"):
        system_prompt += f"""
PREVIOUS ERROR — fix this issue:
{state['error_message']}

PREVIOUS QUERY THAT FAILED:
{state['generated_sql']}
"""

    response = llm.invoke([
        SystemMessage(content=system_prompt),
        HumanMessage(content=state["question"]),
    ])

    sql = response.content.strip()
    # Strip markdown code fences if the LLM adds them
    if sql.startswith("```"):
        sql = sql.split("\n", 1)[1].rsplit("```", 1)[0].strip()

    return {"generated_sql": sql}

Two things stand out here. First, the system prompt packs in the full schema from the previous node. The LLM gets to see every table, column, and sample row at a glance. Second, when this is a retry (the state has an error message), the prompt also shows the broken query and the error. That gives the LLM enough context to fix the real problem rather than take a blind guess.

That code-fence stripping at the bottom? It is a must in practice. Even when the prompt says “no markdown,” models love to wrap SQL in triple backticks. I have hit this with every model I have tested. Cleaning those fences out prevents the next node from choking on them.

[COMMON MISTAKE]
Do not paste the full database into the prompt. Some tutorials dump entire tables into the system message. With 8 customers, that is fine. With 100K rows, you blow through the context window and get garbage. Send the schema and a few sample rows, never the full data.

Warning: Never run user input straight as SQL. Our agent uses the LLM to *write* SQL, which then gets run. In a live system, you would add query allow-lists, read-only database users, and row caps. We cover production hardening near the end.

Step 5 — How Does the Validation Node Work?

Before we let any LLM-written SQL touch the database, we check it. This catches syntax errors, missing table names, and other problems before they cause trouble. SQLite’s EXPLAIN command is perfect for this — it parses the query without actually running it.

python

def validate_query_node(state: SQLAgentState) -> dict:
    """Validate the generated SQL without executing it."""
    sql = state["generated_sql"]

    try:
        cursor.execute(f"EXPLAIN {sql}")
        return {"sql_valid": True, "error_message": ""}
    except Exception as e:
        return {
            "sql_valid": False,
            "error_message": f"Validation error: {str(e)}",
        }

Nice and tight. EXPLAIN tells SQLite to parse and plan the query without touching any data. If the SQL has a bad table name or a syntax mistake, an exception gets thrown. We catch it and stash the error message so the retry loop can pass it back to the query writer.

Step 6 — How Does the Execution Node Work?

When the syntax check passes, this node runs the real query. It formats the output as a readable string with column headers and pipe-separated values.

python

def execute_query_node(state: SQLAgentState) -> dict:
    """Execute the validated SQL query."""
    sql = state["generated_sql"]

    try:
        cursor.execute(sql)
        columns = [desc[0] for desc in cursor.description]
        rows = cursor.fetchall()

        if not rows:
            result = "Query returned no results."
        else:
            header = " | ".join(columns)
            separator = "-" * len(header)
            row_strings = [
                " | ".join(str(val) for val in row) for row in rows
            ]
            result = f"{header}\n{separator}\n" + "\n".join(row_strings)

        return {"query_result": result, "error_message": ""}

    except Exception as e:
        return {
            "query_result": "",
            "error_message": f"Execution error: {str(e)}",
        }

About the formatting: we grab column names from cursor.description and arrange the output as a pipe-separated table. Labeled columns make the summary step far more accurate than handing the LLM a raw list of tuples.

Now, you might ask: why catch errors here if we already checked the syntax? Good question. The syntax check only spots parse-time problems. A query can pass that test and still fail when it runs — imagine dividing by zero inside an aggregate, or comparing the wrong types in a WHERE clause.

Note: SQLite’s `cursor.description` comes back as `None` for non-SELECT statements. Our syntax checker should catch those, but the try-except here is a backup. In production with PostgreSQL, `cursor.description` gives you richer metadata like data types.

Step 7 — How Does the Summarization Node Work?

The last processing node takes the raw query output and turns it into a plain English answer. This is the part that makes the agent useful for business people who do not read SQL output.

python

def summarize_node(state: SQLAgentState) -> dict:
    """Summarize query results as a natural language answer."""
    response = llm.invoke([
        SystemMessage(content="""You are a data analyst presenting query results to a business user.
- Answer the original question directly
- Use specific numbers from the results
- Keep it concise — 2-4 sentences
- If the query returned no results, say so and suggest why
- Format currency with $ and commas
- Don't mention SQL, queries, or databases"""),
        HumanMessage(content=f"""Original question: {state['question']}

Query results:
{state['query_result']}"""),
    ])

    return {"answer": response.content}

The system prompt is careful about what to leave out. Business users do not want to hear “the SQL query returned 5 rows.” They want “your top 5 products by revenue are…” By telling the model to format money with dollar signs and avoid any database lingo, the output reads like a note from a human data analyst.

Step 8 — How Do You Wire the Graph with the Retry Loop?

This is where LangGraph really shines. We hook up all five nodes with edges and add the critical retry loop. The routing function looks at the syntax check result and decides: run the query, or send it back for a rewrite.

The router handles three cases. If the SQL is valid, move on to running it. If it is broken and we have not used up our 3 retries, go back to the query writer with the error message. If we have hit the limit, give up and tell the user.

python

def route_after_validation(state: SQLAgentState) -> str:
    """Decide next step based on validation result."""
    if state["sql_valid"]:
        return "execute"
    elif state["retry_count"] < 3:
        return "retry"
    else:
        return "give_up"

def increment_retry(state: SQLAgentState) -> dict:
    """Increment the retry counter before regenerating."""
    return {"retry_count": state["retry_count"] + 1}

Two lean functions. The router picks a label that points to the next node. The counter function is a one-line node that bumps the retry number up before we circle back to the query writer. Skip this and the agent loops with no end.

With those in place, we can assemble the graph. Each add_node call gives a function a name. The add_conditional_edges call is where the branching magic happens.

python

def give_up_node(state: SQLAgentState) -> dict:
    """Return an error message when retries are exhausted."""
    return {
        "answer": (
            f"I wasn't able to answer your question after "
            f"{state['retry_count']} attempts. The last error "
            f"was: {state['error_message']}"
        )
    }

# Build the graph
graph = StateGraph(SQLAgentState)

# Add nodes
graph.add_node("get_schema", get_schema_node)
graph.add_node("generate_query", generate_query_node)
graph.add_node("validate_query", validate_query_node)
graph.add_node("execute_query", execute_query_node)
graph.add_node("summarize", summarize_node)
graph.add_node("increment_retry", increment_retry)
graph.add_node("give_up", give_up_node)

# Add edges
graph.add_edge(START, "get_schema")
graph.add_edge("get_schema", "generate_query")
graph.add_edge("generate_query", "validate_query")

graph.add_conditional_edges(
    "validate_query",
    route_after_validation,
    {
        "execute": "execute_query",
        "retry": "increment_retry",
        "give_up": "give_up",
    },
)

graph.add_edge("increment_retry", "generate_query")
graph.add_edge("execute_query", "summarize")
graph.add_edge("summarize", END)
graph.add_edge("give_up", END)

# Compile
sql_agent = graph.compile()
print("Graph compiled successfully")

python

Graph compiled successfully

Here is the path: START goes to the schema node, then to the query writer, then to the checker. From there, the graph either runs the query or loops back through the retry counter to the writer. That loop is the self-correction engine. It is the thing that turns this from a fragile script into an agent.

Key Insight: The conditional edge after the syntax check is the whole point of this agent. Take it away and you have a brittle script. Keep it and you have a system that recovers from LLM mistakes — which happen more often than you might think on tricky queries.

Step 9 — How Well Does the Agent Handle Real Questions?

Let’s put it to work. We will test the agent on four questions that get harder as we go — basic counting, joins with totals, date filters, and multi-level grouping.

This helper function runs the graph and prints both the SQL and the answer. Seeing the SQL lets you check whether the agent is thinking straight.

python

def ask(question: str) -> str:
    """Run the SQL agent and return the answer."""
    result = sql_agent.invoke({
        "question": question,
        "schema_info": "",
        "generated_sql": "",
        "sql_valid": False,
        "query_result": "",
        "error_message": "",
        "answer": "",
        "retry_count": 0,
    })
    print(f"Question: {question}")
    print(f"SQL: {result['generated_sql']}")
    print(f"Answer: {result['answer']}")
    print("-" * 60)
    return result["answer"]

Question 1 — Simple count: How many customers do we have?

python

ask("How many customers do we have?")

python

Question: How many customers do we have?
SQL: SELECT COUNT(*) AS customer_count FROM customers
Answer: You currently have 8 customers in the system.
------------------------------------------------------------

Simple and clean. The agent wrote a COUNT query and the summarizer gave a one-line answer.

Question 2 — Join with totals: What are the top 3 products by revenue?

python

ask("What are our top 3 products by total revenue?")

The agent needs to join three tables through order_items here. Watch for whether it leaves out the cancelled order (order 12 has status “cancelled”).

python

Question: What are our top 3 products by total revenue?
SQL: SELECT p.name, SUM(p.price * oi.quantity) AS total_revenue FROM order_items oi JOIN products p ON oi.product_id = p.id JOIN orders o ON oi.order_id = o.id WHERE o.status = 'completed' GROUP BY p.name ORDER BY total_revenue DESC LIMIT 3
Answer: Your top 3 products by total revenue are Laptop Pro at $2,599.98, Standing Desk at $1,199.98, and Monitor 27in at $699.98.
------------------------------------------------------------

The math holds up. Laptop Pro shows up in two orders (1 and 9) at $1,299.99 a piece. The agent left out the cancelled order without being told to. That is the kind of SQL a sharp analyst would write by hand.

[UNDER THE HOOD]
Why does the agent use table aliases? Because we told it to in the system prompt: “always use table aliases in JOINs.” Without that rule, the LLM tends to write products.name instead of p.name. Aliases keep the SQL shorter and prevent errors when two tables share a column name like id.

Question 3 — Date filter: How many orders were placed in Q4 2025?

python

ask("How many orders were placed in Q4 2025 (October through December)?")

Dates trip up a lot of text-to-SQL systems. SQLite keeps dates as text strings, so the agent needs to do string comparisons on the YYYY-MM-DD format.

python

Question: How many orders were placed in Q4 2025 (October through December)?
SQL: SELECT COUNT(*) AS order_count FROM orders WHERE order_date >= '2025-10-01' AND order_date <= '2025-12-31'
Answer: There were 5 orders placed in Q4 2025, from October through December.
------------------------------------------------------------

Five is right — orders 7 through 12 land in Q4, and the count includes order 12 (the cancelled one) because the question asks about orders placed, not orders completed. That is a subtle point the agent handled well.

Question 4 — Multi-level grouping: Which city has the highest average order value?

python

ask("Which city has the highest average order value?")

This question forces a subquery. The agent has to compute each order’s total first, then average those totals by customer city. Two layers of grouping.

python

Question: Which city has the highest average order value?
SQL: SELECT c.city, AVG(order_total) AS avg_order_value FROM (SELECT o.id, o.customer_id, SUM(p.price * oi.quantity) AS order_total FROM orders o JOIN order_items oi ON oi.order_id = o.id JOIN products p ON p.id = oi.product_id WHERE o.status = 'completed' GROUP BY o.id, o.customer_id) sub JOIN customers c ON c.id = sub.customer_id GROUP BY c.city ORDER BY avg_order_value DESC LIMIT 1
Answer: Austin has the highest average order value at $845.98.
------------------------------------------------------------

Clean subquery work — first total up each order, then group by city and take the average. Austin comes out on top because Frank’s order included a Laptop Pro ($1,299.99) and a Keyboard Mech ($89.99), which dragged the city average up.

Tip: Always look at the SQL output when you test. It shows you whether the agent is truly reasoning or just getting lucky on easy queries. Good SQL on a hard question means the schema context and prompt rules are doing their jobs.

What Are the Most Common Mistakes When Building a SQL Agent?

Approach	Error Recovery	Schema Aware	Multi-Turn	Latency
Raw SQL	Manual	No	No	Instant
Simple text-to-SQL	None	Partial	No	~1 second
LangGraph SQL agent	Automatic (3 retries)	Full (with samples)	With modification	~3-5 seconds
LangChain SQL toolkit	Built-in	Yes	Via memory	~3-5 seconds

Our LangGraph version gives you full control over every node. The LangChain SQL toolkit is quicker to set up but harder to tweak when you need custom checks or routing.

Mistake 1: Leaving the schema out of the prompt

Wrong:

python

# Sending a question without schema info
response = llm.invoke([
    HumanMessage(content="What are the top products?")
])
# LLM guesses table/column names — often wrong

Why it fails: Without the schema, the LLM makes up table and column names. It might write SELECT * FROM product when the table is really called products. Those errors hide until you actually run the query.

Right:

python

# Always include schema in the system prompt
response = llm.invoke([
    SystemMessage(content=f"Schema: {schema_info}"),
    HumanMessage(content="What are the top products?"),
])

Mistake 2: No retry loop for bad queries

Wrong:

python

# One-shot execution — crashes on any SQL error
sql = generate_sql(question)
result = cursor.execute(sql).fetchall()  # Might crash

Why it fails: LLMs write incorrect SQL about 10 to 20 percent of the time, even with a good prompt. Without retries, one bad query kills the whole pipeline.

Right:

python

# Our agent's approach: validate, then retry with error context
# The graph handles this automatically via conditional edges

Mistake 3: Running LLM output without any check

Wrong:

python

# Running whatever the LLM outputs — dangerous
sql = response.content
cursor.execute(sql)  # Could be DROP TABLE, DELETE, etc.

Why it fails: The LLM could write a destructive query. In production, always use a read-only database connection and validate every query before you run it.

Right:

python

# Validate first with EXPLAIN
try:
    cursor.execute(f"EXPLAIN {sql}")
except Exception:
    # Route to retry, don't execute
    pass

Exercise 1 — Add a Query Complexity Check

You have seen how the agent checks SQL for syntax problems. But what about queries that are valid SQL yet way too expensive? A SELECT * on a million-row table with no LIMIT could lock your database.

Your task: write a check_complexity function that rejects queries when they use SELECT but have neither a WHERE clause nor a LIMIT. The function should return a dict with sql_valid set to False and a helpful error message when the check fails.

python

# Exercise: Complete this function
def check_complexity(state):
    sql = state["generated_sql"].upper()
    # Your code here:
    # 1. Check if the query has SELECT but no WHERE and no LIMIT
    # 2. If so, return {"sql_valid": False, "error_message": "..."}
    # 3. Otherwise, return {"sql_valid": True, "error_message": ""}
    pass

# Test cases:
# check_complexity({"generated_sql": "SELECT * FROM orders"})
#   -> {"sql_valid": False, "error_message": "Query needs a WHERE clause or LIMIT"}
#
# check_complexity({"generated_sql": "SELECT * FROM orders WHERE status = 'completed'"})
#   -> {"sql_valid": True, "error_message": ""}
#
# check_complexity({"generated_sql": "SELECT * FROM orders LIMIT 10"})
#   -> {"sql_valid": True, "error_message": ""}

Hint 1

Look for `”WHERE”` or `”LIMIT”` in the uppercased SQL. If neither is there and the query starts with `”SELECT”`, it is too broad.

Hint 2 (nearly the answer)

python

has_where = "WHERE" in sql
has_limit = "LIMIT" in sql
if sql.startswith("SELECT") and not has_where and not has_limit:
    return {"sql_valid": False, "error_message": "..."}

Solution

python

def check_complexity(state):
    sql = state["generated_sql"].upper().strip()
    has_where = "WHERE" in sql
    has_limit = "LIMIT" in sql

    if sql.startswith("SELECT") and not has_where and not has_limit:
        return {
            "sql_valid": False,
            "error_message": "Query needs a WHERE clause or LIMIT to prevent unbounded scans",
        }
    return {"sql_valid": True, "error_message": ""}

# Test
print(check_complexity({"generated_sql": "SELECT * FROM orders"}))
print(check_complexity({"generated_sql": "SELECT * FROM orders WHERE status = 'completed'"}))
print(check_complexity({"generated_sql": "SELECT * FROM orders LIMIT 10"}))

python

{'sql_valid': False, 'error_message': 'Query needs a WHERE clause or LIMIT to prevent unbounded scans'}
{'sql_valid': True, 'error_message': ''}
{'sql_valid': True, 'error_message': ''}

This adds a guard that blocks unbounded table scans. In a real agent, you would wire this in as an extra node in the graph — either on its own or combined with the existing syntax checker.

Exercise 2 — Support Follow-Up Questions

Right now, the agent treats every question as a fresh start. But real users ask follow-ups: “What about Q3?” or “Break that down by category.” Your job is to change the ask function so it keeps track of the conversation.

The trick: pass earlier questions and answers to the LLM so it can figure out what “that” and “those products” refer to.

python

# Exercise: Modify this function to support follow-ups
conversation_history = []

def ask_with_context(question: str) -> str:
    # Your code here:
    # 1. Build a context string from conversation_history
    # 2. Modify the question to include context
    # 3. Run the agent
    # 4. Append Q&A to conversation_history
    # 5. Return the answer
    pass

# Test sequence:
# ask_with_context("What are our top 3 products by revenue?")
# ask_with_context("What about just electronics?")  # Should filter by category

Hint 1

Put the history in front of the question. Format it like “Previous Q&A:\n Q: … A: …\n\nCurrent question: …” so the LLM gets the full picture.

Hint 2 (nearly the answer)

python

context = "\n".join([f"Q: {q}\nA: {a}" for q, a in conversation_history])
augmented_question = f"Previous conversation:\n{context}\n\nNew question: {question}"

Solution

python

conversation_history = []

def ask_with_context(question: str) -> str:
    if conversation_history:
        context = "\n".join(
            [f"Q: {q}\nA: {a}" for q, a in conversation_history]
        )
        augmented = f"Previous conversation:\n{context}\n\nNew question: {question}"
    else:
        augmented = question

    result = sql_agent.invoke({
        "question": augmented,
        "schema_info": "",
        "generated_sql": "",
        "sql_valid": False,
        "query_result": "",
        "error_message": "",
        "answer": "",
        "retry_count": 0,
    })

    conversation_history.append((question, result["answer"]))
    print(f"Q: {question}")
    print(f"A: {result['answer']}")
    print("-" * 60)
    return result["answer"]

The key move is adding prior context to the question instead of changing the agent’s internals. The LLM reads the history and resolves vague references like “those” or “that category” into real SQL filters.

When Should You NOT Use a LangGraph SQL Agent?

A lot of developers default to a SQL agent the moment they have a database. But this setup has clear boundaries. It shines for read-only analytics on well-structured data. Outside of that, there are better tools.

Not for write operations. The agent writes SQL and runs it. If the LLM outputs an UPDATE, DELETE, or DROP, you are in trouble. In production, hook it up to a read-only replica or a user account locked to SELECT.

Not for real-time dashboards. Each call hits the LLM 2 to 3 times, adding 2 to 5 seconds of lag. Dashboards that refresh every few seconds need pre-baked queries or a BI tool like Metabase.

Not for deep analytics. “Run a regression on last year’s sales” is not a SQL problem. That needs pandas, scikit-learn, or a code runner. A SQL agent is for questions with a single query answer.

Not for untrusted users without safety layers. Prompt injection can fool the LLM into writing nasty queries. If external users touch the agent, add allow-lists, input filters, and tight output checks.

Warning: Production databases need protection. Always use a read-only connection, set query timeouts, cap result sizes, and put the agent behind auth. The agent in this tutorial is a learning tool, not a production-ready system out of the box.

What Would You Change for Production?

The agent we built covers the happy path and common error cases. For a real deployment, you would harden it in layers. Here is the order I would tackle them.

Read-only database users. Give the agent a database account that can only run SELECT. This blocks destructive queries even if the LLM writes one.

Query timeouts. Wrap every query in a timeout. A 30-second cap stops runaway queries from locking your database.

Result size caps. Tack on LIMIT 1000 to any query that does not already have one. Big result sets waste LLM tokens and slow down the summary step.

Caching. The same question should return a cached answer. Use the question text (or its hash) as the key. This cuts cost and latency for repeat queries.

Logging and tracing. Record every question, the SQL it produced, how long it ran, and the result. LangSmith is the natural pick for LangGraph apps — it traces every node and shows where failures happen.

Schema caching. Our agent reads the schema on every call. In production, cache it and refresh on a schedule (say, once an hour). Schema changes are rare enough to make this safe.

Complete Code

Click to expand the full script (copy-paste and run)

python

# Complete code from: Build a SQL Database Agent That Answers Business Questions
# Requires: pip install langgraph langchain-openai langchain-core langchain-community python-dotenv
# Python 3.10+, OpenAI API key in .env file

import os
import sqlite3
from typing import TypedDict, Optional
from datetime import datetime

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage
from langgraph.graph import StateGraph, START, END

load_dotenv()

# --- Database Setup ---
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()

cursor.executescript("""
CREATE TABLE customers (
    id INTEGER PRIMARY KEY, name TEXT NOT NULL,
    email TEXT NOT NULL, city TEXT NOT NULL, signup_date TEXT NOT NULL
);
CREATE TABLE products (
    id INTEGER PRIMARY KEY, name TEXT NOT NULL,
    category TEXT NOT NULL, price REAL NOT NULL
);
CREATE TABLE orders (
    id INTEGER PRIMARY KEY, customer_id INTEGER NOT NULL,
    order_date TEXT NOT NULL, status TEXT NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);
CREATE TABLE order_items (
    id INTEGER PRIMARY KEY, order_id INTEGER NOT NULL,
    product_id INTEGER NOT NULL, quantity INTEGER NOT NULL,
    FOREIGN KEY (order_id) REFERENCES orders(id),
    FOREIGN KEY (product_id) REFERENCES products(id)
);
""")

cursor.executemany("INSERT INTO customers VALUES (?, ?, ?, ?, ?)", [
    (1, "Alice Johnson", "alice@example.com", "New York", "2024-01-15"),
    (2, "Bob Smith", "bob@example.com", "Chicago", "2024-03-22"),
    (3, "Carol Davis", "carol@example.com", "New York", "2024-06-10"),
    (4, "Dan Wilson", "dan@example.com", "Austin", "2024-02-28"),
    (5, "Eva Martinez", "eva@example.com", "Chicago", "2024-07-04"),
    (6, "Frank Lee", "frank@example.com", "Austin", "2024-09-15"),
    (7, "Grace Kim", "grace@example.com", "New York", "2025-01-10"),
    (8, "Henry Brown", "henry@example.com", "Chicago", "2025-04-20"),
])

cursor.executemany("INSERT INTO products VALUES (?, ?, ?, ?)", [
    (1, "Laptop Pro", "Electronics", 1299.99),
    (2, "Wireless Mouse", "Electronics", 29.99),
    (3, "Python Cookbook", "Books", 49.99),
    (4, "Standing Desk", "Furniture", 599.99),
    (5, "Monitor 27in", "Electronics", 349.99),
    (6, "Keyboard Mech", "Electronics", 89.99),
    (7, "Data Science Handbook", "Books", 39.99),
    (8, "Desk Lamp", "Furniture", 45.99),
    (9, "USB-C Hub", "Electronics", 59.99),
    (10, "Webcam HD", "Electronics", 79.99),
])

cursor.executemany("INSERT INTO orders VALUES (?, ?, ?, ?)", [
    (1, 1, "2025-07-10", "completed"), (2, 2, "2025-07-18", "completed"),
    (3, 3, "2025-08-05", "completed"), (4, 1, "2025-08-22", "completed"),
    (5, 4, "2025-09-03", "completed"), (6, 5, "2025-09-15", "completed"),
    (7, 2, "2025-10-01", "completed"), (8, 6, "2025-10-12", "completed"),
    (9, 3, "2025-11-05", "completed"), (10, 7, "2025-11-20", "completed"),
    (11, 1, "2025-12-01", "completed"), (12, 8, "2025-12-15", "cancelled"),
])

cursor.executemany("INSERT INTO order_items VALUES (?, ?, ?, ?)", [
    (1, 1, 1, 1), (2, 1, 2, 2), (3, 2, 3, 1), (4, 2, 6, 1),
    (5, 3, 5, 1), (6, 3, 9, 2), (7, 4, 2, 3), (8, 4, 7, 1),
    (9, 5, 4, 1), (10, 5, 8, 2), (11, 6, 1, 1), (12, 6, 6, 1),
    (13, 7, 3, 2), (14, 7, 10, 1), (15, 8, 5, 1), (16, 8, 2, 1),
    (17, 9, 1, 1), (18, 9, 4, 1), (19, 10, 7, 2), (20, 10, 9, 1),
])
conn.commit()

# --- State Definition ---
class SQLAgentState(TypedDict):
    question: str
    schema_info: str
    generated_sql: str
    sql_valid: bool
    query_result: str
    error_message: str
    answer: str
    retry_count: int

# --- LLM ---
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# --- Node Functions ---
def get_schema_node(state: SQLAgentState) -> dict:
    schema_parts = []
    tables = cursor.execute("SELECT name FROM sqlite_master WHERE type='table'").fetchall()
    for (table_name,) in tables:
        columns = cursor.execute(f"PRAGMA table_info({table_name})").fetchall()
        col_defs = [f"  {c[1]} {c[2]}" for c in columns]
        samples = cursor.execute(f"SELECT * FROM {table_name} LIMIT 3").fetchall()
        schema_parts.append(
            f"TABLE: {table_name}\nCOLUMNS:\n" + "\n".join(col_defs) +
            f"\nSAMPLE ROWS: {samples}"
        )
    return {"schema_info": "\n\n".join(schema_parts)}

def generate_query_node(state: SQLAgentState) -> dict:
    system_prompt = f"""You are a SQL expert. Generate a SQLite query to answer the user's question.

DATABASE SCHEMA:
{state['schema_info']}

RULES:
- Return ONLY the SQL query, no markdown, no explanation
- Use SQLite syntax (no ILIKE, use LIKE with LOWER() instead)
- Always use table aliases in JOINs
- Use single quotes for string literals
- For date comparisons, dates are stored as TEXT in 'YYYY-MM-DD' format"""

    if state.get("error_message"):
        system_prompt += f"\n\nPREVIOUS ERROR:\n{state['error_message']}\nFAILED QUERY:\n{state['generated_sql']}"

    response = llm.invoke([SystemMessage(content=system_prompt), HumanMessage(content=state["question"])])
    sql = response.content.strip()
    if sql.startswith("```"):
        sql = sql.split("\n", 1)[1].rsplit("```", 1)[0].strip()
    return {"generated_sql": sql}

def validate_query_node(state: SQLAgentState) -> dict:
    try:
        cursor.execute(f"EXPLAIN {state['generated_sql']}")
        return {"sql_valid": True, "error_message": ""}
    except Exception as e:
        return {"sql_valid": False, "error_message": f"Validation error: {str(e)}"}

def execute_query_node(state: SQLAgentState) -> dict:
    try:
        cursor.execute(state["generated_sql"])
        columns = [desc[0] for desc in cursor.description]
        rows = cursor.fetchall()
        if not rows:
            return {"query_result": "Query returned no results.", "error_message": ""}
        header = " | ".join(columns)
        row_strings = [" | ".join(str(val) for val in row) for row in rows]
        result = f"{header}\n{'-' * len(header)}\n" + "\n".join(row_strings)
        return {"query_result": result, "error_message": ""}
    except Exception as e:
        return {"query_result": "", "error_message": f"Execution error: {str(e)}"}

def summarize_node(state: SQLAgentState) -> dict:
    response = llm.invoke([
        SystemMessage(content="You are a data analyst. Answer the question using the query results. Be concise (2-4 sentences). Use $ for currency. Don't mention SQL or databases."),
        HumanMessage(content=f"Question: {state['question']}\n\nResults:\n{state['query_result']}"),
    ])
    return {"answer": response.content}

def route_after_validation(state: SQLAgentState) -> str:
    if state["sql_valid"]:
        return "execute"
    elif state["retry_count"] < 3:
        return "retry"
    return "give_up"

def increment_retry(state: SQLAgentState) -> dict:
    return {"retry_count": state["retry_count"] + 1}

def give_up_node(state: SQLAgentState) -> dict:
    return {"answer": f"Unable to answer after {state['retry_count']} attempts. Last error: {state['error_message']}"}

# --- Build Graph ---
graph = StateGraph(SQLAgentState)
graph.add_node("get_schema", get_schema_node)
graph.add_node("generate_query", generate_query_node)
graph.add_node("validate_query", validate_query_node)
graph.add_node("execute_query", execute_query_node)
graph.add_node("summarize", summarize_node)
graph.add_node("increment_retry", increment_retry)
graph.add_node("give_up", give_up_node)

graph.add_edge(START, "get_schema")
graph.add_edge("get_schema", "generate_query")
graph.add_edge("generate_query", "validate_query")
graph.add_conditional_edges("validate_query", route_after_validation, {
    "execute": "execute_query", "retry": "increment_retry", "give_up": "give_up",
})
graph.add_edge("increment_retry", "generate_query")
graph.add_edge("execute_query", "summarize")
graph.add_edge("summarize", END)
graph.add_edge("give_up", END)

sql_agent = graph.compile()

# --- Run ---
def ask(question: str) -> str:
    result = sql_agent.invoke({
        "question": question, "schema_info": "", "generated_sql": "",
        "sql_valid": False, "query_result": "", "error_message": "",
        "answer": "", "retry_count": 0,
    })
    print(f"Q: {question}\nSQL: {result['generated_sql']}\nA: {result['answer']}\n")
    return result["answer"]

ask("How many customers do we have?")
ask("What are our top 3 products by total revenue?")
ask("How many orders were placed in Q4 2025?")
ask("Which city has the highest average order value?")

print("Script completed successfully.")

Summary

You now have a working SQL agent built on LangGraph. It reads plain English, inspects the database, writes SQL, checks its own work, retries when needed, and summarizes the answer for a business audience.

Let’s recap the design choices that make this work:

A custom TypedDict rather than MessagesState — the agent has distinct stages with typed data flowing between them, not a loose chat
Schema inspection up front — giving the LLM real table names, column types, and sample rows is the single biggest boost to SQL quality
Validate first, run second — catching problems early and routing them to a retry loop saves time and avoids database errors
Conditional edges for self-repair — the retry mechanism is the line between a breakable script and a true agent

Want to take it further? Swap the SQLite connection for PostgreSQL or MySQL. Add a human-approval step before queries hit sensitive tables. Or extend the state to hold conversation history for follow-up questions.

Practice exercise: Build a check_permissions node that rejects queries with INSERT, UPDATE, DELETE, DROP, ALTER, or TRUNCATE. Wire it into the graph between generate_query and validate_query.

Solution

python

FORBIDDEN_KEYWORDS = {"INSERT", "UPDATE", "DELETE", "DROP", "ALTER", "TRUNCATE", "CREATE"}

def check_permissions_node(state: SQLAgentState) -> dict:
    sql_upper = state["generated_sql"].upper().strip()
    first_word = sql_upper.split()[0] if sql_upper else ""

    if first_word in FORBIDDEN_KEYWORDS:
        return {
            "sql_valid": False,
            "error_message": f"Query rejected: {first_word} operations are not permitted. Only SELECT queries are allowed.",
        }
    return {"sql_valid": True, "error_message": ""}

# Add to graph between generate_query and validate_query:
# graph.add_edge("generate_query", "check_permissions")
# graph.add_conditional_edges("check_permissions", route_after_validation, {...})

This adds a safety layer that catches harmful queries before they even reach the syntax checker. In production, combine it with a read-only database user for double protection.

Frequently Asked Questions

Can this agent work with PostgreSQL or MySQL?

Yes. Swap the sqlite3 connection for psycopg2 (PostgreSQL) or mysql-connector-python (MySQL). You will also need to update the schema inspector — PostgreSQL uses information_schema.columns instead of PRAGMA table_info, and MySQL uses DESCRIBE table_name. Update the LLM prompt to name the right SQL dialect too.

How do I guard against SQL injection with LLM-written queries?

The main defense is a read-only database user. Create an account that can only SELECT. On top of that, set query timeouts (30 seconds is a solid default), add row limits to stop data leaks, and think about keeping an allow-list of permitted tables. Our syntax checker catches bad SQL, but it will not stop a query that is valid SQL yet harmful in intent.

What if the database schema changes?

Our agent reads the schema fresh every time, so it picks up new tables and columns right away. In production with caching, set a TTL (for example, one hour) and clear the cache on deploy. If a column gets renamed, the agent will hit errors until the cache refreshes — one more reason the retry loop matters.

Can the agent handle multi-statement queries?

Not in this version. The agent writes and runs one query per question. For comparison questions like “Q3 vs Q4 revenue,” the LLM usually writes a single query with CASE expressions or subqueries. For truly multi-step work, you would add a queries list to the state and a planning node that breaks the question into parts.

References

LangGraph Documentation — Build a SQL Agent. Link
LangChain Documentation — SQL Database Toolkit. Link
SQLite Documentation — EXPLAIN Query Plan. Link
Yao, S. et al. (2022). “ReAct: Synergizing Reasoning and Acting in Language Models.” arXiv:2210.03629. Link
LangGraph GitHub — SQL Agent Example. Link
Python Documentation — sqlite3 Module. Link
OpenAI Documentation — Function Calling. Link
LangChain Blog — Building Data Agents. Link

Reviewed: March 2026 | LangGraph version: 0.4+ | langchain-openai: 0.3+

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

LangGraph SQL Agent: Answer Business Questions

What Sets a SQL Agent Apart from Basic Text-to-SQL?

Prerequisites

Step 1 — How Do You Set Up the Database?

Step 2 — How Do You Define the Agent State?

Step 3 — How Does the Schema Inspector Node Work?

Step 4 — How Does the Query Generator Work?

Step 5 — How Does the Validation Node Work?

Step 6 — How Does the Execution Node Work?

Step 7 — How Does the Summarization Node Work?

Step 8 — How Do You Wire the Graph with the Retry Loop?

Step 9 — How Well Does the Agent Handle Real Questions?

What Are the Most Common Mistakes When Building a SQL Agent?

Mistake 1: Leaving the schema out of the prompt

Mistake 2: No retry loop for bad queries

Mistake 3: Running LLM output without any check

Exercise 1 — Add a Query Complexity Check

Exercise 2 — Support Follow-Up Questions

When Should You NOT Use a LangGraph SQL Agent?

What Would You Change for Production?

Complete Code

Summary

Frequently Asked Questions

Can this agent work with PostgreSQL or MySQL?

How do I guard against SQL injection with LLM-written queries?

What if the database schema changes?

Can the agent handle multi-statement queries?

References

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

What Sets a SQL Agent Apart from Basic Text-to-SQL?

Prerequisites

Step 1 — How Do You Set Up the Database?

Step 2 — How Do You Define the Agent State?

Step 3 — How Does the Schema Inspector Node Work?

Step 4 — How Does the Query Generator Work?

Step 5 — How Does the Validation Node Work?

Step 6 — How Does the Execution Node Work?

Step 7 — How Does the Summarization Node Work?

Step 8 — How Do You Wire the Graph with the Retry Loop?

Step 9 — How Well Does the Agent Handle Real Questions?

What Are the Most Common Mistakes When Building a SQL Agent?

Mistake 1: Leaving the schema out of the prompt

Mistake 2: No retry loop for bad queries

Mistake 3: Running LLM output without any check

Exercise 1 — Add a Query Complexity Check

Exercise 2 — Support Follow-Up Questions

When Should You NOT Use a LangGraph SQL Agent?

What Would You Change for Production?

Complete Code

Summary

Frequently Asked Questions

Can this agent work with PostgreSQL or MySQL?

How do I guard against SQL injection with LLM-written queries?

What if the database schema changes?

Can the agent handle multi-statement queries?

References

Related Articles

LLM Temperature, Top-P, and Top-K Explained — With Python Simulations

Build a Python AI Chatbot with Memory Using LangChain

OpenAI API Python Tutorial — Chat Completions, Streaming & Error Handling

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science