LangGraph Platform: Deploy Agents as APIs

Deploy LangGraph agents as scalable APIs with LangGraph Server — step-by-step guide covering SDK setup, streaming, background runs, and cloud hosting.

Written by Selva Prabhakaran | 24 min read

LangGraph Platform wraps your agent in a ready-made API server so any app can talk to it — here’s how to go from notebook to live service in minutes.

So you’ve got a working LangGraph agent on your laptop. Great. But a Slack bot, a React frontend, or a cron job can’t just import your Python file and run it. They need an API to call.

Building that API from scratch is more work than you’d think. You need REST routes, a database for chat state, a streaming layer, and a queue for slow tasks. LangGraph Platform bundles all of that into one package. In this guide, I’ll walk you through deploying an agent as an API and calling it from a Python client.

Before we write code, let me lay out how the parts fit together.

Think of your LangGraph graph as the engine of a car. LangGraph Server is the chassis — it takes that engine and puts it behind a set of HTTP endpoints. Clients send requests; the server loads the correct graph, runs it, stores state in a database, and pushes the output back. An SDK client library lets you skip raw HTTP and work with clean Python methods instead. The term “LangGraph Platform” covers everything: the server, the CLI, the visual Studio debugger, and cloud hosting options.

One detail worth highlighting: every single run gets saved as a checkpoint. If your server restarts, nothing is lost. The agent picks up from its last saved point.

What Is LangGraph Platform?

In short, it’s the “go live” layer for LangGraph agents. Hand it a graph, and it turns that graph into a running service — no need to write your own FastAPI routes, stand up a database, or rig streaming from scratch.

Four pieces make up the platform:

LangGraph Server — provides 30+ REST endpoints covering threads, runs, streaming, assistants, and cron jobs.
LangGraph SDK — gives you Python and JavaScript clients that wrap those endpoints.
LangGraph CLI — lets you build, test, and launch servers from the terminal.
LangGraph Studio — a visual workspace where you test and debug graphs in real time.

Why go through all this? Because agents aren’t typical web services. They hold state across many requests. They stream tokens one at a time. Some tasks run for minutes, not seconds. Wiring each of those features by hand is a project in itself.

Key Insight: > LangGraph Platform goes far beyond simple hosting. It tackles the hard stuff — saving state across requests, running tasks that take minutes in the background, pushing tokens to clients one by one, and scaling across machines — so you don’t have to.

Prerequisites

Python version: 3.10+
Required libraries: langgraph (0.4+), langgraph-sdk (0.1.51+), langchain-openai (0.3+), langchain-core (0.3+)
Install: pip install langgraph langgraph-cli langgraph-sdk langchain-openai langchain-core
API key: An OpenAI API key set as OPENAI_API_KEY. See OpenAI’s docs to create one.
Docker: Required for langgraph up. Install from docker.com.
Time to complete: ~40 minutes
Prior knowledge: Basic LangGraph concepts (nodes, edges, state) from earlier posts in this series.

How Do You Structure a Project for LangGraph Server?

If you’ve ever tried shipping a Python script and then realized it needs a proper folder layout, this will feel familiar. The server has to know two things: where your graph object lives, and what packages it depends on.

At a minimum, you need three files:

python

my-agent/
  ├── agent.py            # Your graph definition
  ├── langgraph.json      # Server configuration
  └── requirements.txt    # Python dependencies

Let me walk through each one. The config file langgraph.json is the entry point the server reads when it boots. Here’s the shortest version that works:

json

{
  "dependencies": ["."],
  "graphs": {
    "agent": "./agent.py:graph"
  },
  "env": ".env"
}

What do these fields mean? "graphs" is a name-to-path map. The key "agent" is the public name clients will use. The value "./agent.py:graph" tells the server to look for a variable called graph inside agent.py. "dependencies" says “install from the current directory.” And "env" points at a file holding secrets like API keys.

Next up: the agent itself. This one uses an LLM that can call a weather tool. It follows the ReAct loop — at each step the LLM either calls a tool or writes a final answer.

python

# agent.py
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # Simplified for demo — real apps call a weather API
    weather_data = {
        "London": "Cloudy, 15C",
        "Tokyo": "Sunny, 22C",
        "New York": "Rainy, 18C",
    }
    return weather_data.get(city, f"No data for {city}")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([get_weather])

def assistant(state: MessagesState):
    """Call the LLM with tool bindings."""
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

graph_builder = StateGraph(MessagesState)
graph_builder.add_node("assistant", assistant)
graph_builder.add_node("tools", ToolNode([get_weather]))
graph_builder.add_edge(START, "assistant")
graph_builder.add_conditional_edges("assistant", tools_condition)
graph_builder.add_edge("tools", "assistant")

graph = graph_builder.compile()

Here’s the flow: the assistant node asks the LLM what to do. If the LLM picks the weather tool, tools_condition steers the graph to the tools node. Once the tool finishes, control loops back to assistant. When the LLM answers without calling any tool, the graph wraps up.

Finally, the dependency file is just three lines:

python

langgraph>=0.4
langchain-openai>=0.3
langchain-core>=0.3

Tip: > Lock exact versions before you ship. Loose pins like >= work fine while learning, but in a live app you want langgraph==0.4.3 to avoid surprise breakage from upstream changes.

How Do You Start the Server on Your Machine?

You now have three files. Let’s fire things up. The CLI ships two commands: langgraph dev for fast, in-memory testing, and langgraph up for a Docker-backed setup that behaves like a real deployment.

My advice: start with langgraph dev. No Docker needed, no database to spin up. It launches a lightweight server you can hit right away.

bash

langgraph dev

You’ll see output like this:

python

Ready!
- API: http://127.0.0.1:2024
- Docs: http://127.0.0.1:2024/docs
- LangGraph Studio Web UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024

Just like that, your agent has an API on port 2024. The server parsed langgraph.json, found the graph, stood up every endpoint, and is now listening. Visit http://127.0.0.1:2024/docs to browse the full API reference.

If you want data that sticks around after a restart, use langgraph up instead:

bash

langgraph up

This one builds a Docker image and brings up a PostgreSQL container alongside it. The first build takes a bit, but after that your chats survive server restarts — exactly what you’d want in a staging or QA setting.

Warning: > langgraph dev stores everything in memory. The moment you stop the process, every thread, every checkpoint, and every chat log disappears. Keep it for coding and testing only.
Note: > What langgraph up actually does behind the curtain: It creates a Docker Compose stack with two services — your agent and a PostgreSQL instance. The server writes checkpoints to PostgreSQL, giving you the same data model you’d get in the cloud.

How Do You Call the Server Using the LangGraph SDK?

Your server is up. Now you need a way to send it messages. Sure, you could craft raw HTTP calls — every endpoint is a standard REST route. But the SDK handles auth headers, JSON encoding, and streaming plumbing in one neat package.

Here’s how you connect. The get_client call takes a URL and hands back a client object you’ll use for everything:

python

from langgraph_sdk import get_client

client = get_client(url="http://127.0.0.1:2024")

# Check which graphs are available
assistants = await client.assistants.search()
print(assistants)

You should see something like:

python

[{'assistant_id': 'agent', 'graph_id': 'agent', ...}]

Every entry in the "graphs" section of langgraph.json registers as an “assistant” on the server. The names line up one-to-one.

How Do Threads and Messages Work?

A “thread” is simply a container for one conversation. It stores the full message history and all related state. You open a thread, then push runs (that’s the API’s word for messages) into it.

When you call client.runs.stream, the SDK sends your message to the agent and yields events as the graph runs. Each event carries one piece of the reply:

python

# Create a new conversation thread
thread = await client.threads.create()
print(f"Thread ID: {thread['thread_id']}")

# Send a message and stream the response
async for event in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]},
):
    if event.event == "values":
        messages = event.data.get("messages", [])
        if messages:
            last = messages[-1]
            print(f"[{last['type']}]: {last.get('content', '')}")

The stream looks like this:

python

[ai]:
[tool]: Sunny, 22C
[ai]: The weather in Tokyo is currently sunny with a temperature of 22C.

Three events paint the full picture. First, the LLM chose to invoke the weather tool (you see an AI message with no text but tool-call data attached). Second, the tool ran and returned “Sunny, 22C.” Third, the LLM took that result and wrote a polished reply for the user.

Does the Server Remember Past Messages?

Yes — and you don’t have to lift a finger. Send a follow-up to the same thread without any history:

python

async for event in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "How about London?"}]},
):
    if event.event == "values":
        messages = event.data.get("messages", [])
        if messages:
            last = messages[-1]
            print(f"[{last['type']}]: {last.get('content', '')}")

Output:

python

[ai]:
[tool]: Cloudy, 15C
[ai]: The weather in London is currently cloudy with a temperature of 15C.

The agent knew “How about London?” was about weather because the server pulled the thread’s full history before running the graph. You sent a single new message. The server filled in the rest. That’s the same pattern used by every real-world chatbot.

Key Insight: > Your client never manages state. It sends a thread ID and a message. The server fetches the history, executes the graph with all the context, persists the new state, and pushes back the result. The client stays thin and stateless.

What Endpoints Come Built-In?

You don’t get a single /invoke route. The server ships with over 30 endpoints organized around five core resources:

Resource	What It Manages	Key Endpoints
Assistants	Graph configurations	`POST /assistants`, `GET /assistants/search`
Threads	Conversation state	`POST /threads`, `GET /threads/{id}/state`
Runs	Graph executions	`POST /runs`, `POST /runs/stream`, `POST /runs/wait`
Cron Jobs	Scheduled runs	`POST /threads/{id}/runs/crons`
Store	Long-term memory	`PUT /store/items`, `POST /store/items/search`

Let me zoom in on the three ways to kick off a run:

POST /runs — blocks until the graph is done. Good for fast queries that finish in seconds.
POST /runs/stream — feeds you events while the graph works. Ideal for chat interfaces.
POST /runs/wait — launches the graph in the background and lets you poll. Best when tasks need minutes.

Curious how this looks at the HTTP level? Here is the same chat done with curl:

bash

curl -X POST http://127.0.0.1:2024/threads \
  -H "Content-Type: application/json" \
  -d '{}'

Response:

json

{"thread_id": "abc123-...", "created_at": "...", "metadata": {}}

Then trigger a run on that thread:

bash

curl -X POST http://127.0.0.1:2024/threads/abc123/runs/wait \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "agent",
    "input": {"messages": [{"role": "user", "content": "Weather in New York?"}]}
  }'

The JSON that comes back has the whole graph output — tool calls, tool results, and the final answer. The SDK wraps these exact HTTP calls in friendlier methods.

How Do Background Runs Help with Slow Tasks?

Picture an agent that needs several minutes to dig through sources and draft a report. Holding an HTTP connection open that long is asking for timeouts.

LangGraph Server offers background runs for exactly this. You fire off the task, receive a run ID on the spot, and come back later to grab the result. The server feeds your graph into a task queue and handles it behind the scenes.

client.runs.create starts a background run and returns right away:

python

run = await client.runs.create(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "What's the weather in all three cities?"}]},
)
print(f"Run ID: {run['run_id']}")
print(f"Status: {run['status']}")

Instant response:

python

Run ID: run-456...
Status: pending

Later, check progress:

python

run_status = await client.runs.get(
    thread_id=thread["thread_id"],
    run_id=run["run_id"],
)
print(f"Status: {run_status['status']}")

python

Status: success

Once the status flips to success, read the thread state to get the answer. This is the right pattern for any workload where tying up a web server thread while the agent thinks is not an option.

Tip: > Skip manual polling — use client.runs.join. Calling await client.runs.join(thread_id, run_id) waits until the run wraps up. It polls under the hood so you don’t need retry loops of your own.

What Are the Ways to Go Live — Local, Cloud, and Self-Hosted?

We’ve been running on localhost. When it’s time to serve real users, you have four paths. Each trades simplicity for control:

Option	Where It Runs	Best For	State Storage	Cost
`langgraph dev`	Your laptop	Coding and testing	RAM only	Free
Cloud SaaS	LangSmith servers	Quick launch, small teams	Managed PostgreSQL	Pay as you go
BYOC	Your AWS/GCP VPC	Strict data rules	Your own database	License fee
Self-Hosted	Your own machines	Total control	Your own database	License fee

Cloud Path

Cloud SaaS is the shortest route to a live URL. Your code sits in a GitHub repo. The platform builds and ships it for you.

Four steps and you’re done:

Push your project (including langgraph.json) to GitHub.
Connect the repo inside the LangSmith dashboard.
Fill in your API keys under deployment settings.
Hit deploy.

The platform creates a Docker image, sets up PostgreSQL, and hands you a URL. Your client code changes in exactly two spots:

python

cloud_client = get_client(
    url="https://your-deployment-id.us.langgraph.app",
    api_key="your-langsmith-api-key",
)

# Everything else is identical
thread = await cloud_client.threads.create()
async for event in cloud_client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "Weather in Tokyo?"}]},
):
    print(event.data)

This is the SDK’s biggest selling point. Same code, swap the URL. Whether you point at localhost or the cloud, the client works the same way.

Self-Hosted with Docker

Want to run everything on your own servers? Build a container image from your project:

bash

langgraph build -t my-agent-server

That image is fully portable. Push it to any registry and deploy on Kubernetes, ECS, Cloud Run — anything that speaks Docker.

The server expects a PostgreSQL instance for state. Hand it the connection string through an env var:

bash

docker run -p 8123:8000 \
  -e OPENAI_API_KEY="your-key" \
  -e DATABASE_URI="postgresql://user:pass@host:5432/langgraph" \
  my-agent-server

Warning: > Never bake secrets into your Docker images. Rely on env vars, a secrets manager like AWS Secrets Manager or HashiCorp Vault, or Kubernetes secrets. A key baked into an image layer is a breach waiting to happen.

How Do Assistants Let You Tweak Behavior Without Redeploying?

Sooner or later you’ll face this: you want to A/B test two system prompts. Or your customer team wants a friendly tone while the dev-facing API keeps things brief.

Assistants are the answer. An assistant is a named profile for your graph. Same code underneath, different settings on top.

Create one with client.assistants.create. The config dictionary carries the settings your graph reads at runtime:

python

assistant = await client.assistants.create(
    graph_id="agent",
    config={
        "configurable": {
            "system_prompt": "You are a concise weather bot. Temperature in Celsius and Fahrenheit."
        }
    },
    name="weather-expert",
)
print(f"Assistant ID: {assistant['assistant_id']}")

python

Assistant ID: asst-789...

Want to adjust the prompt a week later? Hit the API — zero downtime, zero deploys:

python

updated = await client.assistants.update(
    assistant_id=assistant["assistant_id"],
    config={
        "configurable": {
            "system_prompt": "You are a detailed weather bot. Include temperature, humidity, and wind."
        }
    },
)

This is how prompt work actually plays out in live systems. You push code once and then refine the wording through API calls. I find this far better than redeploying every time a prompt changes — it’s quicker and lower risk.

Tip: > Use assistants to separate use cases. One graph, many profiles: “support-bot” for end users, “dev-bot” for your team, “test-bot” for QA. Each carries its own system prompt, model pick, and tool list — without copying a single line of code.

How Do You Pick the Right Streaming Mode?

Different consumers want different levels of detail. A chatbot needs to show words appearing one at a time. A monitoring dashboard only cares about state diffs. A debugger wants the full picture. The server gives you four modes to choose from:

Mode	What You Receive	Typical Use
`values`	Full state snapshot after each node	Debugging and auditing
`messages`	Individual LLM tokens as they’re produced	Chat UIs
`updates`	Only the fields each node changed	Live dashboards
`events`	Low-level LangGraph events	Custom pipeline logic

For a chat interface, go with messages. It pushes each token the moment the LLM writes it — the live-typing feel you know from ChatGPT:

python

async for event in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "Tell me about Tokyo's weather"}]},
    stream_mode="messages",
):
    if hasattr(event, 'data') and event.data:
        if isinstance(event.data, dict) and "content" in event.data:
            print(event.data["content"], end="", flush=True)

Every print renders one token. On the frontend, text appears word by word.

When something goes wrong, switch to values mode. It shows the full state after every node:

python

async for event in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "Weather in London?"}]},
    stream_mode="values",
):
    if event.event == "values":
        msgs = event.data.get("messages", [])
        print(f"--- {len(msgs)} messages in state ---")
        for m in msgs:
            print(f"  [{m['type']}]: {m.get('content', '[tool call]')[:60]}")

Now you watch the state grow at each step — user message, LLM tool request, tool reply, final answer. When a run goes sideways, this view pinpoints exactly where things went off track.

Common Mistakes and How to Fix Them

Mistake 1: Wrong Graph Variable Name in langgraph.json

json

{
  "graphs": {
    "agent": "./agent.py:app"
  }
}

Why it breaks: Your code defines graph = graph_builder.compile(), but the config references :app. The server tries to import an object that doesn’t exist and throws an error.

The fix: Make the name after the colon match your actual variable:

json

{
  "graphs": {
    "agent": "./agent.py:graph"
  }
}

Mistake 2: Running langgraph dev in Production

bash

# This will lose user data on every restart
langgraph dev --host 0.0.0.0

Why it breaks: Everything lives in RAM. The next restart wipes every thread and every conversation. Users lose all their history.

The fix:

bash

langgraph up

Mistake 3: Missing Environment Variables

bash

langgraph dev
# Server starts fine, but first request crashes:
# ERROR: OPENAI_API_KEY not set

Why it fools you: The server boots without complaint even when keys are missing. The crash only shows up when the first real request reaches the LLM. Inside a Docker container, this is even trickier to spot.

The fix: Put a .env file in your project and reference it from langgraph.json:

python

# .env
OPENAI_API_KEY=sk-your-key-here
LANGSMITH_API_KEY=your-langsmith-key

Warning: > Make sure .env is in .gitignore. When the secrets file sits next to langgraph.json, it’s dangerously easy to commit it. Add the exclusion before your very first commit.

Exercise 1: Deploy and Query Your Agent

Time to wire everything up yourself. Launch the weather agent locally, then hold a two-turn chat using the SDK.

typescript

{
  type: 'exercise',
  id: 'deploy-query-agent',
  title: 'Exercise 1: Deploy and Query Your Agent',
  difficulty: 'advanced',
  exerciseType: 'write',
  instructions: 'Using the LangGraph SDK, create a thread, ask about the weather in Tokyo, then follow up by asking about London on the SAME thread. Print both AI responses. The agent should call the get_weather tool for each city.',
  starterCode: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\n# Step 1: Create a thread\nthread = await client.threads.create()\n\n# Step 2: Send first message about Tokyo\n# YOUR CODE HERE — use client.runs.stream\n\n# Step 3: Send follow-up about London on the SAME thread\n# YOUR CODE HERE\n\nprint("DONE")',
  testCases: [
    { id: 'tc1', input: 'print(thread["thread_id"][:4])', expectedOutput: 'DONE', hidden: true, description: 'Thread created' },
    { id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Both queries complete' },
  ],
  hints: [
    'Use client.runs.stream with thread_id=thread["thread_id"], assistant_id="agent", and input={"messages": [{"role": "user", "content": "..."}]}',
    'For the follow-up, use the same thread_id. The server loads conversation history automatically.',
  ],
  solution: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\nthread = await client.threads.create()\n\nasync for event in client.runs.stream(\n    thread_id=thread["thread_id"],\n    assistant_id="agent",\n    input={"messages": [{"role": "user", "content": "What is the weather in Tokyo?"}]},\n    stream_mode="values",\n):\n    if event.event == "values":\n        msgs = event.data.get("messages", [])\n        if msgs:\n            print(f"Tokyo: {msgs[-1].get(\'content\', \'\')}")\n\nasync for event in client.runs.stream(\n    thread_id=thread["thread_id"],\n    assistant_id="agent",\n    input={"messages": [{"role": "user", "content": "How about London?"}]},\n    stream_mode="values",\n):\n    if event.event == "values":\n        msgs = event.data.get("messages", [])\n        if msgs:\n            print(f"London: {msgs[-1].get(\'content\', \'\')}")\n\nprint("DONE")',
  solutionExplanation: 'Reusing the same thread_id is the key. The server loads full conversation history, so the agent understands "How about London?" refers to weather from the previous message.',
  xpReward: 20,
}

Exercise 2: Create a Custom Assistant

Now build your own assistant with a different personality.

typescript

{
  type: 'exercise',
  id: 'create-custom-assistant',
  title: 'Exercise 2: Create a Custom Assistant',
  difficulty: 'advanced',
  exerciseType: 'write',
  instructions: 'Create a new assistant named "brief-weather-bot" from the "agent" graph. Give it a system prompt: "Reply with just the city and temperature, nothing else." Then query it about New York weather.',
  starterCode: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\n# Step 1: Create a custom assistant\nassistant = await client.assistants.create(\n    # YOUR CODE HERE\n)\n\n# Step 2: Create a thread and query YOUR assistant\nthread = await client.threads.create()\n# YOUR CODE HERE\n\nprint("DONE")',
  testCases: [
    { id: 'tc1', input: 'print(assistant["name"])', expectedOutput: 'brief-weather-bot', description: 'Name matches' },
    { id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Query completes' },
  ],
  hints: [
    'assistants.create needs graph_id="agent", name="brief-weather-bot", and config={"configurable": {"system_prompt": "..."}}.',
    'In client.runs.stream, use assistant["assistant_id"] — not "agent" — as the assistant_id.',
  ],
  solution: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\nassistant = await client.assistants.create(\n    graph_id="agent",\n    name="brief-weather-bot",\n    config={"configurable": {"system_prompt": "Reply with just the city and temperature, nothing else."}},\n)\n\nthread = await client.threads.create()\nasync for event in client.runs.stream(\n    thread_id=thread["thread_id"],\n    assistant_id=assistant["assistant_id"],\n    input={"messages": [{"role": "user", "content": "Weather in New York?"}]},\n    stream_mode="values",\n):\n    if event.event == "values":\n        msgs = event.data.get("messages", [])\n        if msgs:\n            print(msgs[-1].get("content", ""))\n\nprint("DONE")',
  solutionExplanation: 'Creating an assistant from the same graph gives you different behavior without touching graph code. The assistant_id from the response replaces the default "agent" name in your run calls.',
  xpReward: 20,
}

When Should You NOT Use LangGraph Platform?

Powerful as it is, the platform is overkill in a few situations.

One-shot chains with no memory. If your workflow is just “prompt in, answer out” with no state and no tools, a bare FastAPI endpoint does the job at lower cost.

Microsecond-sensitive paths. The server adds overhead for state management, checkpoints, and the HTTP layer. When every millisecond counts, call the LLM directly.

Tip: > Use langgraph dev as a quick smoke test. If the local server covers what you need, then choose a hosting path. The free self-hosted tier covers up to 1 million node runs.

Heavy investment in another framework. If your team already runs CrewAI or AutoGen, switching just for the hosting layer isn’t worth the migration cost. Reach for BentoML or a FastAPI wrapper instead.

Avoiding vendor lock-in. Cloud SaaS ties you to LangChain’s servers. If that’s a concern, grab the self-hosted Docker option — it runs fully on your own gear.

Complete Code

Click to expand the full project files (copy-paste and run)

python

# agent.py
# Complete code from: LangGraph Platform — Deploying Agents as APIs
# Requires: pip install langgraph langchain-openai langchain-core
# Python 3.10+

from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    weather_data = {
        "London": "Cloudy, 15C",
        "Tokyo": "Sunny, 22C",
        "New York": "Rainy, 18C",
    }
    return weather_data.get(city, f"No data for {city}")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([get_weather])

def assistant(state: MessagesState):
    """Call the LLM with tool bindings."""
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

graph_builder = StateGraph(MessagesState)
graph_builder.add_node("assistant", assistant)
graph_builder.add_node("tools", ToolNode([get_weather]))
graph_builder.add_edge(START, "assistant")
graph_builder.add_conditional_edges("assistant", tools_condition)
graph_builder.add_edge("tools", "assistant")

graph = graph_builder.compile()

json

// langgraph.json
{
  "dependencies": ["."],
  "graphs": {
    "agent": "./agent.py:graph"
  },
  "env": ".env"
}

python

# client.py — Interact with the deployed agent
# Requires: pip install langgraph-sdk
# Start the server first: langgraph dev

import asyncio
from langgraph_sdk import get_client

async def main():
    client = get_client(url="http://127.0.0.1:2024")

    thread = await client.threads.create()
    print(f"Thread: {thread['thread_id']}")

    # First message
    async for event in client.runs.stream(
        thread_id=thread["thread_id"],
        assistant_id="agent",
        input={"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]},
        stream_mode="values",
    ):
        if event.event == "values":
            msgs = event.data.get("messages", [])
            if msgs:
                print(f"[{msgs[-1]['type']}]: {msgs[-1].get('content', '')}")

    # Follow-up on same thread
    async for event in client.runs.stream(
        thread_id=thread["thread_id"],
        assistant_id="agent",
        input={"messages": [{"role": "user", "content": "How about London?"}]},
        stream_mode="values",
    ):
        if event.event == "values":
            msgs = event.data.get("messages", [])
            if msgs:
                print(f"[{msgs[-1]['type']}]: {msgs[-1].get('content', '')}")

asyncio.run(main())

Summary

LangGraph Platform takes a graph you built in a notebook and makes it available as a live API. Point langgraph.json at your graph, run langgraph dev, and you instantly get a server with streaming, state storage, and thread management built in.

The SDK keeps things simple on the client side. Create threads, send messages, stream answers — and the exact same code works whether you’re hitting localhost or a cloud URL. For slow workloads, background runs offload the wait. For prompt experiments, assistants let you change behavior on the fly without pushing new code.

Practice exercise: Extend the weather agent with a get_stock_price tool. Deploy it using langgraph dev. Spin up two assistants — one that gives short answers and one that gives detailed answers. Ask both the same question and compare what each returns.

Click to see the solution outline

1. Add a `get_stock_price` tool with the `@tool` decorator and mock data.
2. Bind both tools: `llm.bind_tools([get_weather, get_stock_price])`.
3. Update `ToolNode` to include both tools.
4. Deploy with `langgraph dev`.
5. Create two assistants via `client.assistants.create()` with different prompts.
6. Create threads for each, send the same question, and compare.

Frequently Asked Questions

Can I use LangGraph Server without LangSmith?

Yes. The self-hosted options — Docker through langgraph up or a custom image from langgraph build — run entirely on your own machines with no LangSmith tie. You only need LangSmith for cloud SaaS hosting and the Studio web UI.

Does the server support JavaScript agents?

The server itself runs Python agents only. However, the SDK client ships in both Python and JavaScript flavors. So your agent logic stays in Python while your Node.js frontend talks to it through the JS SDK: import { Client } from "@langchain/langgraph-sdk".

How does pricing work?

The free self-hosted tier covers up to 1 million node runs. Cloud SaaS follows a pay-as-you-go model through LangSmith plans. BYOC and enterprise tiers need a license. See LangChain’s pricing page for current numbers.

Can I run multiple agents on one server?

Absolutely. Add more entries under "graphs" in langgraph.json. Each one becomes its own named assistant. Clients address each by name — a clean way to host several focused agents on shared hardware.

json

{
  "graphs": {
    "weather-agent": "./agents/weather.py:graph",
    "support-agent": "./agents/support.py:graph"
  }
}

References

LangGraph Platform documentation — Deployment quickstart. Link
LangGraph Platform GA announcement — LangChain Blog. Link
LangGraph SDK — Python package on PyPI. Link
LangGraph Platform API reference. Link
LangGraph GitHub repository — source code and examples. Link
Why LangGraph Platform for agent deployment — LangChain Blog. Link
LangGraph local server documentation. Link
LangSmith deployment infrastructure. Link

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

LangGraph Platform: Deploy Agents as APIs

What Is LangGraph Platform?

Prerequisites

How Do You Structure a Project for LangGraph Server?

How Do You Start the Server on Your Machine?

How Do You Call the Server Using the LangGraph SDK?

How Do Threads and Messages Work?

Does the Server Remember Past Messages?

What Endpoints Come Built-In?

How Do Background Runs Help with Slow Tasks?

What Are the Ways to Go Live — Local, Cloud, and Self-Hosted?

Cloud Path

Self-Hosted with Docker

How Do Assistants Let You Tweak Behavior Without Redeploying?

How Do You Pick the Right Streaming Mode?

Common Mistakes and How to Fix Them

Mistake 1: Wrong Graph Variable Name in langgraph.json

Mistake 2: Running langgraph dev in Production

Mistake 3: Missing Environment Variables

Exercise 1: Deploy and Query Your Agent

Exercise 2: Create a Custom Assistant

When Should You NOT Use LangGraph Platform?

Complete Code

Summary

Frequently Asked Questions

Can I use LangGraph Server without LangSmith?

Does the server support JavaScript agents?

How does pricing work?

Can I run multiple agents on one server?

References

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

What Is LangGraph Platform?

Prerequisites

How Do You Structure a Project for LangGraph Server?

How Do You Start the Server on Your Machine?

How Do You Call the Server Using the LangGraph SDK?

How Do Threads and Messages Work?

Does the Server Remember Past Messages?

What Endpoints Come Built-In?

How Do Background Runs Help with Slow Tasks?

What Are the Ways to Go Live — Local, Cloud, and Self-Hosted?

Cloud Path

Self-Hosted with Docker

How Do Assistants Let You Tweak Behavior Without Redeploying?

How Do You Pick the Right Streaming Mode?

Common Mistakes and How to Fix Them

Mistake 1: Wrong Graph Variable Name in langgraph.json

Mistake 2: Running langgraph dev in Production

Mistake 3: Missing Environment Variables

Exercise 1: Deploy and Query Your Agent

Exercise 2: Create a Custom Assistant

When Should You NOT Use LangGraph Platform?

Complete Code

Summary

Frequently Asked Questions

Can I use LangGraph Server without LangSmith?

Does the server support JavaScript agents?

How does pricing work?

Can I run multiple agents on one server?

References

Related Articles

LLM Temperature, Top-P, and Top-K Explained — With Python Simulations

Build a Python AI Chatbot with Memory Using LangChain

OpenAI API Python Tutorial — Chat Completions, Streaming & Error Handling

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science