LangGraph Customer Support Agent with Escalation

Learn to build a LangGraph customer support agent that classifies queries, searches a knowledge base, escalates to humans, and logs every step.

Written by Selva Prabhakaran | 31 min read

Build a production-ready support agent that handles queries, searches a knowledge base, escalates to humans, and logs every interaction.

Picture this. A customer writes in: “I was charged twice for my order.” Your agent has to check the knowledge base, spot that this is not a simple FAQ, and pass it to a human — along with full context. Not just the raw message. The agent should hand over the sentiment, category, and a suggested fix too.

Most LangGraph guides stop at “agent calls a tool.” But real support systems need much more. They need routing logic, smart escalation, knowledge base search, logging, and clean handoffs. That is what we will build in this project.

Let me walk you through how the pieces fit together before we write any code.

A customer message comes in. The agent reads it and sorts it — is this a billing issue, a product question, a complaint, or something else? That label drives a router.

If the question is simple, the router sends it to the knowledge base. The agent finds an answer and replies. But if the query is tricky, or the customer sounds upset, the router takes a different path. It sends the ticket to the escalation node. That node bundles the message, category, sentiment, and a suggested fix — then flags it for a human agent.

Every single interaction gets logged, whether the agent solved it on its own or handed it off. We will build each piece as its own LangGraph node.

Prerequisites

Python version: 3.10+
Required libraries: langgraph (0.4+), langchain-openai (0.3+), langchain-core (0.3+), python-dotenv
Install: pip install langgraph langchain-openai langchain-core python-dotenv
API key: OpenAI API key stored in a .env file as OPENAI_API_KEY (Get one here)
Prior knowledge: Basic LangGraph concepts (nodes, edges, state). If you haven’t built a graph yet, start with our LangGraph Installation and First Graph post.
Time to complete: 35-40 minutes

What Data Does the Agent Carry Between Nodes?

Think about what the agent needs to know at each step. It needs the customer’s message, of course. It also needs the label from the classifier, the sentiment, the response text, and a flag for whether it escalated. All of that forms your state.

We will use TypedDict to define this. Each field has a clear job — no messy catch-all dictionaries here.

python

import os
from dotenv import load_dotenv
from typing import TypedDict, Literal, Annotated
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langchain_openai import ChatOpenAI

load_dotenv()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)


class SupportState(TypedDict):
    customer_message: str
    category: str
    sentiment: str
    sentiment_score: float
    kb_results: str
    response: str
    escalated: bool
    escalation_reason: str
    log_entry: str

print("State schema defined -- 9 fields covering the full support workflow")

python

State schema defined -- 9 fields covering the full support workflow

You might notice there is no messages list here. That is on purpose. We are not building a chatbot that goes back and forth. We are building a single-turn support pipeline. Each message flows through the graph once and comes out the other side. This keeps the state lean and the logic easy to follow.

Key Insight: Design your state for your workflow, not for a generic chatbot. A support agent that sorts, routes, and escalates needs typed fields (category, sentiment, escalated). A flat messages list would force every node to dig through raw text to find what it needs.

How Does the Classifier Node Work?

Here is the key question: why would a billing query from a calm customer get a different response than the same query from an angry one? Because sentiment matters just as much as the topic.

The classifier assigns two things at once. First, a category — billing, product, technical, complaint, or general. Second, a sentiment — positive, neutral, negative, or angry. A calm billing question goes to the knowledge base. An angry billing question gets escalated, even if the KB has the answer ready.

Under the hood, the classifier sends a prompt to the LLM and asks for JSON back.

python

import json


def classify_message(state: SupportState) -> dict:
    """Classify the customer message by category and sentiment."""
    prompt = f"""Analyze this customer support message.

Message: "{state['customer_message']}"

Return JSON with exactly these fields:
- category: one of [billing, product, technical, complaint, general]
- sentiment: one of [positive, neutral, negative, angry]
- sentiment_score: float from -1.0 (very negative) to 1.0 (very positive)

Return ONLY valid JSON, no extra text."""

    result = llm.invoke(prompt)
    parsed = json.loads(result.content)

    return {
        "category": parsed["category"],
        "sentiment": parsed["sentiment"],
        "sentiment_score": parsed["sentiment_score"],
    }

print("classify_message node defined")

python

classify_message node defined

I kept the prompt short on purpose. In a real system, you would add a few examples for each category so the LLM gets it right more often. But for learning how the graph works, a simple prompt keeps the focus where it belongs — on the architecture.

How Does the Knowledge Base Node Work?

In a live system, this node would talk to a vector database like Pinecone, ChromaDB, or Weaviate. For this project, we will use a plain dictionary. That way, the graph structure stays front and center.

Here is what this node does. It grabs the category from the classifier and looks for a match in the KB. If it finds one, it returns the answer. If not, it returns an empty string. Later, the router will treat that empty string as a signal to escalate.

python

KNOWLEDGE_BASE = {
    "billing": {
        "double_charge": "If you were charged twice, we'll refund the duplicate within 3-5 business days. No action needed from your side.",
        "refund_policy": "Refunds are processed within 5-7 business days after approval. You'll receive an email confirmation.",
        "payment_methods": "We accept Visa, Mastercard, PayPal, and bank transfers.",
    },
    "product": {
        "shipping": "Standard shipping takes 5-7 business days. Express shipping (2-3 days) is available for $9.99.",
        "returns": "You can return any item within 30 days of delivery for a full refund. Items must be unused.",
        "sizing": "Check our sizing guide at /sizing. If between sizes, we recommend going one size up.",
    },
    "technical": {
        "login": "Try resetting your password at /reset-password. If that fails, clear your browser cache and try again.",
        "app_crash": "Update to the latest version from your app store. If crashes persist, email bugs@support.com with your device model.",
    },
}


def search_knowledge_base(state: SupportState) -> dict:
    """Search the knowledge base for relevant answers."""
    category = state["category"]
    message = state["customer_message"].lower()

    if category not in KNOWLEDGE_BASE:
        return {"kb_results": ""}

    best_match = ""
    for key, answer in KNOWLEDGE_BASE[category].items():
        if any(word in message for word in key.split("_")):
            best_match = answer
            break

    if not best_match:
        entries = list(KNOWLEDGE_BASE[category].values())
        best_match = entries[0] if entries else ""

    return {"kb_results": best_match}

print("search_knowledge_base node defined")

python

search_knowledge_base node defined

Tip: Swap this for a vector search in production. Replace the dictionary lookup with an embedding-based search using `langchain.vectorstores`. The node’s inputs and outputs stay the same — only the guts change. That is the beauty of LangGraph’s modular design.

How Does the Response Generator Work?

Raw KB answers sound like a robot. “Refunds are processed within 5-7 business days” is correct but cold. The response generator takes that raw fact and wraps it in a human tone.

The trick is that it adjusts based on sentiment. An angry customer gets extra warmth and empathy. A neutral customer gets a short, clear answer. This tone mapping makes a big difference. Imagine a customer who writes “I’m furious about this charge!” and gets back “Thanks for reaching out!” — that kills trust on the spot.

python

def generate_response(state: SupportState) -> dict:
    """Generate a customer-friendly response using KB results."""
    tone_guide = {
        "positive": "friendly and warm",
        "neutral": "professional and clear",
        "negative": "empathetic and solution-focused",
        "angry": "deeply empathetic, apologetic, and action-oriented",
    }
    tone = tone_guide.get(state["sentiment"], "professional")

    prompt = f"""Write a customer support response.

Customer message: "{state['customer_message']}"
Category: {state['category']}
Relevant information: {state['kb_results']}

Tone: Be {tone}.
Keep the response under 3 sentences.
Address the customer's concern directly.
Do NOT make up information not in the relevant information above."""

    result = llm.invoke(prompt)
    return {"response": result.content}

print("generate_response node defined")

python

generate_response node defined

That tone mapping is there for a reason. Matching the reply’s mood to how the customer feels is the gap between a helpful bot and one that makes people angrier.

How Does the Escalation Node Work?

Some issues are too complex or too heated for the bot. When that happens, the escalation node steps in and puts together a package for the human agent. That package has everything they need: the message, the category, the sentiment, and a suggested next step.

Why include a suggested fix? Because human agents deal with dozens of tickets a day. A starting point like “suggest refunding the duplicate charge” saves them time and keeps the responses consistent across the team.

python

def escalate_to_human(state: SupportState) -> dict:
    """Package the ticket for human agent review."""
    prompt = f"""A customer support ticket needs human review.

Customer message: "{state['customer_message']}"
Category: {state['category']}
Sentiment: {state['sentiment']} (score: {state['sentiment_score']})
Knowledge base results: {state['kb_results'] or 'No relevant KB entry found'}

Write a brief escalation summary (2-3 sentences) that:
1. States why this needs human attention
2. Suggests a resolution
3. Notes the customer's emotional state"""

    result = llm.invoke(prompt)

    return {
        "escalated": True,
        "escalation_reason": result.content,
        "response": f"I understand your concern. I've escalated this to a specialist who will review your case. You'll hear back within 2 hours.",
    }

print("escalate_to_human node defined")

python

escalate_to_human node defined

Warning: Never tell the customer “I can’t help you.” Always give them a clear next step. “I’ve passed this to a specialist” with a time frame is far better than “I don’t know the answer.” Your auto-response should set real expectations.

How Does the Logger Node Work?

No log trail means no way to review what happened. The logger creates a clean JSON record that captures what the agent saw, what it decided, and what it did about it.

This node runs at the very end of every path. Whether the agent replied on its own or kicked the ticket to a human, the logger records it.

python

from datetime import datetime


def log_interaction(state: SupportState) -> dict:
    """Create a structured log entry for the interaction."""
    log = {
        "timestamp": datetime.now().isoformat(),
        "customer_message": state["customer_message"],
        "category": state["category"],
        "sentiment": state["sentiment"],
        "sentiment_score": state["sentiment_score"],
        "escalated": state.get("escalated", False),
        "response_preview": state["response"][:100],
    }

    if state.get("escalated"):
        log["escalation_reason"] = state.get("escalation_reason", "")

    log_str = json.dumps(log, indent=2)
    print(f"--- Interaction Log ---\n{log_str}")
    return {"log_entry": log_str}

print("log_interaction node defined")

python

log_interaction node defined

How Does the Router Decide What Happens Next?

This is the brain of the whole system. Should the agent reply on its own, or hand off to a person?

Three things drive that choice:

Sentiment: If the customer is angry, always escalate. No exceptions.
Category: Complaints go to a human no matter how calm the tone.
KB coverage: If the knowledge base has no answer, escalate.

LangGraph handles this with conditional edges. You write a function that looks at the state and returns the name of the next node.

python

def route_support(state: SupportState) -> str:
    """Decide whether to respond directly or escalate."""
    # Angry customers always get escalated
    if state["sentiment"] == "angry":
        return "escalate"

    # Complaints go to humans
    if state["category"] == "complaint":
        return "escalate"

    # Very negative sentiment gets escalated
    if state["sentiment_score"] < -0.5:
        return "escalate"

    # No KB results -- escalate
    if not state.get("kb_results"):
        return "escalate"

    return "respond"

print("route_support router defined")

python

route_support router defined

I went with rule-based routing here instead of asking the LLM. Rules are easy to test, easy to predict, and they do not cost you API credits. Save the LLM for tasks that truly need language skills.

How Do You Wire the Full Graph Together?

Five nodes. Two paths. One StateGraph. Here is how they all connect.

The classifier runs first. Then the KB search. Then the router picks a path. One path leads to the response generator. The other leads to escalation. Both meet again at the logger before the graph ends.

python

# Build the graph
workflow = StateGraph(SupportState)

# Add nodes
workflow.add_node("classify", classify_message)
workflow.add_node("search_kb", search_knowledge_base)
workflow.add_node("respond", generate_response)
workflow.add_node("escalate", escalate_to_human)
workflow.add_node("log", log_interaction)

# Wire edges
workflow.add_edge(START, "classify")
workflow.add_edge("classify", "search_kb")

# Conditional routing after KB search
workflow.add_conditional_edges(
    "search_kb",
    route_support,
    {"respond": "respond", "escalate": "escalate"},
)

workflow.add_edge("respond", "log")
workflow.add_edge("escalate", "log")
workflow.add_edge("log", END)

# Compile with checkpointer
checkpointer = MemorySaver()
support_agent = workflow.compile(checkpointer=checkpointer)

print("Support agent compiled -- 5 nodes, 2 paths, conditional routing active")

python

Support agent compiled -- 5 nodes, 2 paths, conditional routing active

Key Insight: The graph structure IS your business logic. When a product manager asks “why did the agent escalate?”, you can point to the exact edge and routing function. You cannot do that with a single mega-prompt where the LLM makes every choice inside one call.

Does the Routing Actually Work? Let’s Test It

Time to put the agent through its paces. We will run three test cases: a simple question that should get an auto-reply, a furious customer that should get escalated, and a formal complaint that should get escalated no matter the tone.

Each run needs its own thread_id. That is how LangGraph keeps different conversations apart inside the checkpointer.

Test 1: Simple billing question (should auto-respond)

python

config = {"configurable": {"thread_id": "test-1"}}

result = support_agent.invoke(
    {
        "customer_message": "What payment methods do you accept?",
        "category": "",
        "sentiment": "",
        "sentiment_score": 0.0,
        "kb_results": "",
        "response": "",
        "escalated": False,
        "escalation_reason": "",
        "log_entry": "",
    },
    config=config,
)

print(f"\nCategory: {result['category']}")
print(f"Sentiment: {result['sentiment']}")
print(f"Escalated: {result['escalated']}")
print(f"Response: {result['response']}")

The agent should tag this as billing or general with a neutral mood. It will find the payment methods entry in the KB and reply right away. No escalation needed.

Test 2: Angry customer (should escalate)

python

config_2 = {"configurable": {"thread_id": "test-2"}}

result_2 = support_agent.invoke(
    {
        "customer_message": "I was charged THREE TIMES for the same order! This is absolutely unacceptable. I want a full refund NOW and I'm filing a complaint with my bank.",
        "category": "",
        "sentiment": "",
        "sentiment_score": 0.0,
        "kb_results": "",
        "response": "",
        "escalated": False,
        "escalation_reason": "",
        "log_entry": "",
    },
    config=config_2,
)

print(f"\nCategory: {result_2['category']}")
print(f"Sentiment: {result_2['sentiment']}")
print(f"Escalated: {result_2['escalated']}")
print(f"Escalation reason: {result_2['escalation_reason'][:200]}")

This one is clearly angry. The classifier should tag it as sentiment: angry, which trips the escalation wire right away. The category and KB results do not even matter here — the anger alone is enough.

Test 3: Complaint (should escalate regardless of sentiment)

python

config_3 = {"configurable": {"thread_id": "test-3"}}

result_3 = support_agent.invoke(
    {
        "customer_message": "I'd like to file a formal complaint. Your delivery service damaged my package and the item inside is broken.",
        "category": "",
        "sentiment": "",
        "sentiment_score": 0.0,
        "kb_results": "",
        "response": "",
        "escalated": False,
        "escalation_reason": "",
        "log_entry": "",
    },
    config=config_3,
)

print(f"\nCategory: {result_3['category']}")
print(f"Sentiment: {result_3['sentiment']}")
print(f"Escalated: {result_3['escalated']}")
print(f"Response: {result_3['response']}")

Even if this customer sounds fairly calm, the complaint category fires the escalation rule. That is your routing logic doing its job.

Exercise 1: Add a Priority Level to the Router

Now that you have seen how routing works, try changing it yourself. Add a priority field to the state and set it based on the routing choice.

typescript

{
  type: 'exercise',
  id: 'priority-routing',
  title: 'Exercise 1: Add Priority-Based Routing',
  difficulty: 'advanced',
  exerciseType: 'write',
  instructions: 'Modify the route_support function to also return a priority level. Add a "priority" field to SupportState. Set priority to "urgent" for angry customers, "high" for complaints and negative sentiment, and "normal" for everything else. Print the priority after routing.',
  starterCode: '# Add "priority" to SupportState\n# Modify route_support to set priority\n\ndef route_support_v2(state):\n    # Your escalation logic here\n    # Set state["priority"] based on rules\n    # Return "escalate" or "respond"\n    pass\n\n# Test with an angry message\ntest_state = {\n    "customer_message": "This is outrageous!",\n    "sentiment": "angry",\n    "sentiment_score": -0.9,\n    "category": "billing",\n    "kb_results": "Refund policy info",\n}\nresult = route_support_v2(test_state)\nprint(f"Route: {result}")\nprint(f"Priority: {test_state.get(\'priority\', \'not set\')}")',
  testCases: [
    { id: 'tc1', input: 'test_state = {"sentiment": "angry", "sentiment_score": -0.9, "category": "billing", "kb_results": "info"}; route_support_v2(test_state); print(test_state["priority"])', expectedOutput: 'urgent', description: 'Angry customers should be urgent priority' },
    { id: 'tc2', input: 'test_state = {"sentiment": "neutral", "sentiment_score": 0.2, "category": "complaint", "kb_results": "info"}; route_support_v2(test_state); print(test_state["priority"])', expectedOutput: 'high', description: 'Complaints should be high priority' },
    { id: 'tc3', input: 'test_state = {"sentiment": "neutral", "sentiment_score": 0.5, "category": "product", "kb_results": "info"}; route_support_v2(test_state); print(test_state["priority"])', expectedOutput: 'normal', description: 'Neutral product queries should be normal priority' },
  ],
  hints: [
    'Check sentiment first (angry = urgent), then category (complaint = high), then default to normal.',
    'Set state["priority"] = "urgent" at the top if sentiment == "angry", then return "escalate". For complaints, set "high". Everything else gets "normal".',
  ],
  solution: 'def route_support_v2(state):\n    if state["sentiment"] == "angry":\n        state["priority"] = "urgent"\n        return "escalate"\n    if state["category"] == "complaint":\n        state["priority"] = "high"\n        return "escalate"\n    if state["sentiment_score"] < -0.5:\n        state["priority"] = "high"\n        return "escalate"\n    if not state.get("kb_results"):\n        state["priority"] = "high"\n        return "escalate"\n    state["priority"] = "normal"\n    return "respond"',
  solutionExplanation: 'The priority assignment mirrors the routing logic. Angry customers get the highest priority (urgent) and always escalate. Complaints and very negative sentiment get high priority. Everything else is normal and routes to the auto-response path.',
  xpReward: 20,
}

How Do You Add Human-in-the-Loop Approval?

What if the escalation choice itself needs a sign-off? Maybe a junior agent should not escalate on their own — a supervisor should approve first.

LangGraph has a built-in tool for this: the interrupt function. It pauses the graph right in the middle of a run, shows the context to a human reviewer, and waits. The graph only moves forward when the reviewer sends a Command to resume.

You need two things to make this work. First, put an interrupt inside the escalation node. Second, use Command(resume=...) to pick up where it left off with the human’s choice.

python

from langgraph.types import interrupt, Command


def escalate_with_approval(state: SupportState) -> dict:
    """Escalate and pause for human approval."""
    prompt = f"""A customer support ticket needs human review.

Customer message: "{state['customer_message']}"
Category: {state['category']}
Sentiment: {state['sentiment']} (score: {state['sentiment_score']})

Write a 2-sentence escalation summary."""

    result = llm.invoke(prompt)
    escalation_summary = result.content

    # Pause here -- show the human what's happening
    human_decision = interrupt({
        "escalation_summary": escalation_summary,
        "customer_message": state["customer_message"],
        "question": "Approve this escalation? (approve/reject/modify)",
    })

    # human_decision comes from Command(resume=...)
    if human_decision.get("action") == "reject":
        return {
            "escalated": False,
            "response": "Let me look into this further for you.",
            "escalation_reason": "Rejected by reviewer",
        }

    return {
        "escalated": True,
        "escalation_reason": escalation_summary,
        "response": human_decision.get(
            "custom_response",
            "I've escalated this to a specialist. You'll hear back within 2 hours.",
        ),
    }

print("escalate_with_approval node defined -- uses interrupt for HITL")

python

escalate_with_approval node defined -- uses interrupt for HITL

Now let’s wire this new node into a graph. The flow takes two steps. The first invoke runs the graph until it hits the interrupt and stops. The second invoke feeds in the reviewer’s choice and finishes the run.

python

# Build graph with approval step
approval_workflow = StateGraph(SupportState)
approval_workflow.add_node("classify", classify_message)
approval_workflow.add_node("search_kb", search_knowledge_base)
approval_workflow.add_node("respond", generate_response)
approval_workflow.add_node("escalate", escalate_with_approval)
approval_workflow.add_node("log", log_interaction)

approval_workflow.add_edge(START, "classify")
approval_workflow.add_edge("classify", "search_kb")
approval_workflow.add_conditional_edges(
    "search_kb",
    route_support,
    {"respond": "respond", "escalate": "escalate"},
)
approval_workflow.add_edge("respond", "log")
approval_workflow.add_edge("escalate", "log")
approval_workflow.add_edge("log", END)

approval_agent = approval_workflow.compile(checkpointer=MemorySaver())
print("Approval-enabled agent compiled")

python

Approval-enabled agent compiled

Let’s see it in action. First, we send a message that should trigger escalation. The graph will pause and wait for a human.

python

config_approval = {"configurable": {"thread_id": "approval-test-1"}}

# Step 1: Invoke -- this will pause at the interrupt
first_result = approval_agent.invoke(
    {
        "customer_message": "I'm really frustrated. I've been waiting 3 weeks for my order and nobody responds to my emails.",
        "category": "",
        "sentiment": "",
        "sentiment_score": 0.0,
        "kb_results": "",
        "response": "",
        "escalated": False,
        "escalation_reason": "",
        "log_entry": "",
    },
    config=config_approval,
)
print("Graph paused at interrupt -- waiting for human decision")

python

# Step 2: Resume with human approval
final_result = approval_agent.invoke(
    Command(resume={"action": "approve", "custom_response": "I sincerely apologize for the delay. I've personally flagged your order for priority review. Expect an update within 24 hours."}),
    config=config_approval,
)
print(f"Escalated: {final_result['escalated']}")
print(f"Response: {final_result['response']}")

Note: The interrupt/Command pattern is the heart of LangGraph’s human-in-the-loop system. The graph saves its full state when it pauses. Even if your server restarts, the graph picks up right where it stopped — as long as your checkpointer stores state to disk. Use PostgresSaver in production for this.

Exercise 2: Add a Modification Path

Right now, the approval step has two options: approve or reject. But in practice, human reviewers often want to tweak the response before it goes out. Add a “modify” action to the escalation approval.

typescript

{
  type: 'exercise',
  id: 'modify-escalation',
  title: 'Exercise 2: Add Response Modification to HITL',
  difficulty: 'advanced',
  exerciseType: 'write',
  instructions: 'Extend the escalate_with_approval function to handle a third action: "modify". When the human sends action="modify" with a "modified_response" field, use their modified response instead of the auto-generated one. The escalation should still be marked as True.',
  starterCode: '# Extend the human_decision handling\ndef handle_human_decision(human_decision, escalation_summary):\n    """Process the human reviewer decision.\n    \n    Args:\n        human_decision: dict with "action" and optional fields\n        escalation_summary: the AI-generated summary\n    \n    Returns:\n        dict with escalated, response, escalation_reason\n    """\n    # Handle "approve", "reject", and "modify" actions\n    pass\n\n# Test\nresult = handle_human_decision(\n    {"action": "modify", "modified_response": "Custom reply here"},\n    "Customer upset about shipping"\n)\nprint(f"Escalated: {result[\'escalated\']}")\nprint(f"Response: {result[\'response\']}")',
  testCases: [
    { id: 'tc1', input: 'r = handle_human_decision({"action": "modify", "modified_response": "Custom reply"}, "summary"); print(r["escalated"]); print(r["response"])', expectedOutput: 'True\nCustom reply', description: 'Modify action should use the custom response' },
    { id: 'tc2', input: 'r = handle_human_decision({"action": "reject"}, "summary"); print(r["escalated"])', expectedOutput: 'False', description: 'Reject should not escalate' },
  ],
  hints: [
    'Add an elif for action == "modify" that reads human_decision["modified_response"].',
    'if action == "approve": return {"escalated": True, "response": default, ...}. elif action == "modify": return {"escalated": True, "response": human_decision["modified_response"], ...}. else reject.',
  ],
  solution: 'def handle_human_decision(human_decision, escalation_summary):\n    action = human_decision.get("action", "approve")\n    if action == "reject":\n        return {"escalated": False, "response": "Let me look into this further.", "escalation_reason": "Rejected"}\n    elif action == "modify":\n        return {"escalated": True, "response": human_decision["modified_response"], "escalation_reason": escalation_summary}\n    else:\n        return {"escalated": True, "response": "Escalated to specialist. Response within 2 hours.", "escalation_reason": escalation_summary}',
  solutionExplanation: 'The modify path keeps escalated=True (the ticket still goes to a human) but uses the reviewer custom response instead of the auto-generated one. This gives human agents control over the customer-facing message while preserving the escalation workflow.',
  xpReward: 20,
}

What Are the Most Common Mistakes?

Mistake 1: Forgetting the checkpointer for HITL workflows

Wrong:

python

# This compiles but interrupt() will fail at runtime
agent = workflow.compile()

Why it breaks: Without a checkpointer, LangGraph has no place to save the graph state when interrupt() pauses. The graph cannot hold its spot.

Correct:

python

agent = workflow.compile(checkpointer=MemorySaver())

Mistake 2: Reusing thread IDs across different conversations

Wrong:

python

# Same thread_id for different customers
config = {"configurable": {"thread_id": "main"}}
agent.invoke(message_1, config=config)
agent.invoke(message_2, config=config)  # Sees state from message_1!

Why it breaks: LangGraph tracks each conversation by its thread_id. If you reuse the same ID, the second call picks up leftover state from the first — including the other customer’s data.

Correct:

python

import uuid
config_1 = {"configurable": {"thread_id": str(uuid.uuid4())}}
config_2 = {"configurable": {"thread_id": str(uuid.uuid4())}}

Mistake 3: Not handling missing state fields

Wrong:

python

def route_support(state):
    if state["sentiment"] == "angry":  # KeyError if classify didn't run
        return "escalate"

Why it breaks: If a node upstream fails or skips a field, reading it with bracket notation throws a KeyError.

Correct:

python

def route_support(state):
    if state.get("sentiment", "") == "angry":
        return "escalate"

When Should You NOT Use This Pattern?

This agent design works great for structured, category-based support. But it is not the right fit for every case.

Skip this pattern when:

You need multi-turn chat. This agent handles one message at a time. For back-and-forth dialogue, you would need a messages list and a loop in the graph.
Your categories change every week. A hard-coded list means you redeploy each time. If labels shift often, let the LLM classify freely without fixed options.
You get fewer than 50 tickets a day. At low volume, the setup cost is not worth it. A simple chain or even a well-crafted prompt will do the job.

Use this pattern when:

You need clear, auditable routing logic that you can explain to stakeholders.
Human escalation is a core part of the system, not an afterthought.
You want to swap out parts (KB, classifier, response writer) without touching the rest.

What Changes for Production?

The code in this tutorial is built for learning. Here is what you would change for a real deployment.

Tip: Production vs. tutorial differences — keep these in mind when you go live.

Tutorial Version	Production Version
`MemorySaver` (in-memory)	`PostgresSaver` or `RedisSaver` (persistent)
Dictionary knowledge base	Vector database (Pinecone, ChromaDB, Weaviate)
`gpt-4o-mini` for all nodes	Cheaper model for classification, stronger for response generation
No authentication	JWT tokens, API keys for every endpoint
Print-based logging	Structured logging with ELK stack or CloudWatch
Single-turn processing	Queue-based processing with retry logic

The single biggest change is the checkpointer. MemorySaver loses all state the moment your process stops. For any system where paused escalations matter, you need a database-backed checkpointer.

Summary

You built a customer support agent with five moving parts: a classifier, a knowledge base search, a response generator, an escalation handler, and a logger. The conditional routing makes it smart — it handles simple queries on its own but sends tricky or heated cases to a human.

Here is the key takeaway: build your graph around business decisions, not just LLM calls. The routing function, the escalation rules, the tone mapping — those are business rules turned into code. They are testable, easy to audit, and they do not depend on the LLM behaving a certain way.

Here is one more exercise to lock in what you learned:

Practice: Add a feedback collection node

After the response is sent (but before logging), add a node that asks the customer to rate the interaction on a 1-5 scale. Store the rating in a new `feedback_score` field in the state. The logger should include this score in its output.

**Solution:**

python

def collect_feedback(state: SupportState) -> dict:
    """Simulate collecting customer feedback."""
    # In production, this would pause for real customer input
    # For this exercise, simulate with interrupt()
    feedback = interrupt({
        "question": "How would you rate this interaction? (1-5)",
        "response_given": state["response"],
    })
    return {"feedback_score": feedback.get("rating", 0)}

Add it to the graph between the response/escalation nodes and the logger:

python

workflow.add_node("feedback", collect_feedback)
workflow.add_edge("respond", "feedback")
workflow.add_edge("escalate", "feedback")
workflow.add_edge("feedback", "log")

Complete Code

Click to expand the full script (copy-paste and run)

python

# Complete code from: Build a Customer Support Agent with Escalation Workflows
# Requires: pip install langgraph langchain-openai langchain-core python-dotenv
# Python 3.10+

import os
import json
import uuid
from datetime import datetime
from dotenv import load_dotenv
from typing import TypedDict, Literal
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import interrupt, Command
from langchain_openai import ChatOpenAI

load_dotenv()

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# --- State Definition ---
class SupportState(TypedDict):
    customer_message: str
    category: str
    sentiment: str
    sentiment_score: float
    kb_results: str
    response: str
    escalated: bool
    escalation_reason: str
    log_entry: str

# --- Knowledge Base ---
KNOWLEDGE_BASE = {
    "billing": {
        "double_charge": "If you were charged twice, we'll refund the duplicate within 3-5 business days. No action needed from your side.",
        "refund_policy": "Refunds are processed within 5-7 business days after approval. You'll receive an email confirmation.",
        "payment_methods": "We accept Visa, Mastercard, PayPal, and bank transfers.",
    },
    "product": {
        "shipping": "Standard shipping takes 5-7 business days. Express shipping (2-3 days) is available for $9.99.",
        "returns": "You can return any item within 30 days of delivery for a full refund. Items must be unused.",
        "sizing": "Check our sizing guide at /sizing. If between sizes, we recommend going one size up.",
    },
    "technical": {
        "login": "Try resetting your password at /reset-password. If that fails, clear your browser cache and try again.",
        "app_crash": "Update to the latest version from your app store. If crashes persist, email bugs@support.com with your device model.",
    },
}

# --- Nodes ---
def classify_message(state: SupportState) -> dict:
    prompt = f"""Analyze this customer support message.
Message: "{state['customer_message']}"
Return JSON with exactly these fields:
- category: one of [billing, product, technical, complaint, general]
- sentiment: one of [positive, neutral, negative, angry]
- sentiment_score: float from -1.0 (very negative) to 1.0 (very positive)
Return ONLY valid JSON, no extra text."""
    result = llm.invoke(prompt)
    parsed = json.loads(result.content)
    return {
        "category": parsed["category"],
        "sentiment": parsed["sentiment"],
        "sentiment_score": parsed["sentiment_score"],
    }

def search_knowledge_base(state: SupportState) -> dict:
    category = state["category"]
    message = state["customer_message"].lower()
    if category not in KNOWLEDGE_BASE:
        return {"kb_results": ""}
    best_match = ""
    for key, answer in KNOWLEDGE_BASE[category].items():
        if any(word in message for word in key.split("_")):
            best_match = answer
            break
    if not best_match:
        entries = list(KNOWLEDGE_BASE[category].values())
        best_match = entries[0] if entries else ""
    return {"kb_results": best_match}

def generate_response(state: SupportState) -> dict:
    tone_guide = {
        "positive": "friendly and warm",
        "neutral": "professional and clear",
        "negative": "empathetic and solution-focused",
        "angry": "deeply empathetic, apologetic, and action-oriented",
    }
    tone = tone_guide.get(state["sentiment"], "professional")
    prompt = f"""Write a customer support response.
Customer message: "{state['customer_message']}"
Category: {state['category']}
Relevant information: {state['kb_results']}
Tone: Be {tone}. Keep under 3 sentences. Address the concern directly."""
    result = llm.invoke(prompt)
    return {"response": result.content}

def escalate_to_human(state: SupportState) -> dict:
    prompt = f"""A customer support ticket needs human review.
Customer message: "{state['customer_message']}"
Category: {state['category']}
Sentiment: {state['sentiment']} (score: {state['sentiment_score']})
KB results: {state['kb_results'] or 'No relevant KB entry found'}
Write a 2-3 sentence escalation summary."""
    result = llm.invoke(prompt)
    return {
        "escalated": True,
        "escalation_reason": result.content,
        "response": "I understand your concern. I've escalated this to a specialist who will review your case. You'll hear back within 2 hours.",
    }

def log_interaction(state: SupportState) -> dict:
    log = {
        "timestamp": datetime.now().isoformat(),
        "customer_message": state["customer_message"],
        "category": state["category"],
        "sentiment": state["sentiment"],
        "sentiment_score": state["sentiment_score"],
        "escalated": state.get("escalated", False),
        "response_preview": state["response"][:100],
    }
    if state.get("escalated"):
        log["escalation_reason"] = state.get("escalation_reason", "")
    log_str = json.dumps(log, indent=2)
    print(f"--- Interaction Log ---\n{log_str}")
    return {"log_entry": log_str}

# --- Router ---
def route_support(state: SupportState) -> str:
    if state.get("sentiment", "") == "angry":
        return "escalate"
    if state.get("category", "") == "complaint":
        return "escalate"
    if state.get("sentiment_score", 0) < -0.5:
        return "escalate"
    if not state.get("kb_results"):
        return "escalate"
    return "respond"

# --- Build Graph ---
workflow = StateGraph(SupportState)
workflow.add_node("classify", classify_message)
workflow.add_node("search_kb", search_knowledge_base)
workflow.add_node("respond", generate_response)
workflow.add_node("escalate", escalate_to_human)
workflow.add_node("log", log_interaction)

workflow.add_edge(START, "classify")
workflow.add_edge("classify", "search_kb")
workflow.add_conditional_edges(
    "search_kb",
    route_support,
    {"respond": "respond", "escalate": "escalate"},
)
workflow.add_edge("respond", "log")
workflow.add_edge("escalate", "log")
workflow.add_edge("log", END)

checkpointer = MemorySaver()
support_agent = workflow.compile(checkpointer=checkpointer)

# --- Run ---
config = {"configurable": {"thread_id": str(uuid.uuid4())}}
result = support_agent.invoke(
    {
        "customer_message": "What payment methods do you accept?",
        "category": "",
        "sentiment": "",
        "sentiment_score": 0.0,
        "kb_results": "",
        "response": "",
        "escalated": False,
        "escalation_reason": "",
        "log_entry": "",
    },
    config=config,
)
print(f"\nCategory: {result['category']}")
print(f"Escalated: {result['escalated']}")
print(f"Response: {result['response']}")

print("\nScript completed successfully.")

Frequently Asked Questions

Can I use a different LLM instead of OpenAI?

Yes. Just swap ChatOpenAI for any LangChain chat model. ChatAnthropic, ChatGroq, or ChatOllama all plug right in. The graph itself does not change at all — only the model setup line.

How do I add more categories to the classifier?

Add the new category name to the classifier prompt and create matching entries in KNOWLEDGE_BASE. You may also need to update route_support if the new category has its own routing rules.

What is the difference between `interrupt` and `add_breakpoint`?

interrupt pauses inside a node and can send data to the reviewer. add_breakpoint (the older API) pauses before a node starts. For new projects, go with interrupt — it gives you finer control and cleaner code.

How do I keep state across server restarts?

Replace MemorySaver() with PostgresSaver from the langgraph-checkpoint-postgres package. Install it with pip install langgraph-checkpoint-postgres, point it at your database URL, and the rest of your code stays the same.

python

from langgraph.checkpoint.postgres import PostgresSaver
checkpointer = PostgresSaver.from_conn_string("postgresql://user:pass@localhost/db")

Can this agent handle multiple languages?

It can. The LLM already detects and responds in different languages on its own. To make it explicit, add a language field to the state and tell the classifier prompt to detect the language. The response generator will then reply in that language by default.

References

LangGraph documentation — StateGraph and Conditional Edges. Link
LangGraph documentation — Human-in-the-Loop patterns. Link
LangGraph documentation — Checkpointers and Persistence. Link
LangChain documentation — ChatOpenAI. Link
LangGraph Customer Support Tutorial (official). Link
LangGraph documentation — interrupt and Command. Link
OpenAI API documentation — Chat Completions. Link

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

LangGraph Customer Support Agent with Escalation

Prerequisites

What Data Does the Agent Carry Between Nodes?

How Does the Classifier Node Work?

How Does the Knowledge Base Node Work?

How Does the Response Generator Work?

How Does the Escalation Node Work?

How Does the Logger Node Work?

How Does the Router Decide What Happens Next?

How Do You Wire the Full Graph Together?

Does the Routing Actually Work? Let’s Test It

Exercise 1: Add a Priority Level to the Router

How Do You Add Human-in-the-Loop Approval?

Exercise 2: Add a Modification Path

What Are the Most Common Mistakes?

Mistake 1: Forgetting the checkpointer for HITL workflows

Mistake 2: Reusing thread IDs across different conversations

Mistake 3: Not handling missing state fields

When Should You NOT Use This Pattern?

What Changes for Production?

Summary

Complete Code

Frequently Asked Questions

Can I use a different LLM instead of OpenAI?

How do I add more categories to the classifier?

What is the difference between `interrupt` and `add_breakpoint`?

How do I keep state across server restarts?

Can this agent handle multiple languages?

References

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Prerequisites

What Data Does the Agent Carry Between Nodes?

How Does the Classifier Node Work?

How Does the Knowledge Base Node Work?

How Does the Response Generator Work?

How Does the Escalation Node Work?

How Does the Logger Node Work?

How Does the Router Decide What Happens Next?

How Do You Wire the Full Graph Together?

Does the Routing Actually Work? Let’s Test It

Exercise 1: Add a Priority Level to the Router

How Do You Add Human-in-the-Loop Approval?

Exercise 2: Add a Modification Path

What Are the Most Common Mistakes?

Mistake 1: Forgetting the checkpointer for HITL workflows

Mistake 2: Reusing thread IDs across different conversations

Mistake 3: Not handling missing state fields

When Should You NOT Use This Pattern?

What Changes for Production?

Summary

Complete Code

Frequently Asked Questions

Can I use a different LLM instead of OpenAI?

How do I add more categories to the classifier?

What is the difference between interrupt and add_breakpoint?

How do I keep state across server restarts?

Can this agent handle multiple languages?

References

Related Articles

Build a Python AI Chatbot with Memory Using LangChain

LangGraph Document Processing Agent: Multi-Modal

LangGraph Web Scraping Agent: Autonomous Pipeline

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

What is the difference between `interrupt` and `add_breakpoint`?