Menu

LangGraph Platform: Deploy Agents as APIs

Deploy LangGraph agents as scalable APIs with LangGraph Server — step-by-step guide covering SDK setup, streaming, background runs, and cloud hosting.

Written by Selva Prabhakaran | 24 min read

LangGraph Platform wraps your agent in a ready-made API server so any app can talk to it — here’s how to go from notebook to live service in minutes.

So you’ve got a working LangGraph agent on your laptop. Great. But a Slack bot, a React frontend, or a cron job can’t just import your Python file and run it. They need an API to call.

Building that API from scratch is more work than you’d think. You need REST routes, a database for chat state, a streaming layer, and a queue for slow tasks. LangGraph Platform bundles all of that into one package. In this guide, I’ll walk you through deploying an agent as an API and calling it from a Python client.

Before we write code, let me lay out how the parts fit together.

Think of your LangGraph graph as the engine of a car. LangGraph Server is the chassis — it takes that engine and puts it behind a set of HTTP endpoints. Clients send requests; the server loads the correct graph, runs it, stores state in a database, and pushes the output back. An SDK client library lets you skip raw HTTP and work with clean Python methods instead. The term “LangGraph Platform” covers everything: the server, the CLI, the visual Studio debugger, and cloud hosting options.

One detail worth highlighting: every single run gets saved as a checkpoint. If your server restarts, nothing is lost. The agent picks up from its last saved point.

What Is LangGraph Platform?

In short, it’s the “go live” layer for LangGraph agents. Hand it a graph, and it turns that graph into a running service — no need to write your own FastAPI routes, stand up a database, or rig streaming from scratch.

Four pieces make up the platform:

  • LangGraph Server — provides 30+ REST endpoints covering threads, runs, streaming, assistants, and cron jobs.
  • LangGraph SDK — gives you Python and JavaScript clients that wrap those endpoints.
  • LangGraph CLI — lets you build, test, and launch servers from the terminal.
  • LangGraph Studio — a visual workspace where you test and debug graphs in real time.

Why go through all this? Because agents aren’t typical web services. They hold state across many requests. They stream tokens one at a time. Some tasks run for minutes, not seconds. Wiring each of those features by hand is a project in itself.

Key Insight: > LangGraph Platform goes far beyond simple hosting. It tackles the hard stuff — saving state across requests, running tasks that take minutes in the background, pushing tokens to clients one by one, and scaling across machines — so you don’t have to.

Prerequisites

  • Python version: 3.10+
  • Required libraries: langgraph (0.4+), langgraph-sdk (0.1.51+), langchain-openai (0.3+), langchain-core (0.3+)
  • Install: pip install langgraph langgraph-cli langgraph-sdk langchain-openai langchain-core
  • API key: An OpenAI API key set as OPENAI_API_KEY. See OpenAI’s docs to create one.
  • Docker: Required for langgraph up. Install from docker.com.
  • Time to complete: ~40 minutes
  • Prior knowledge: Basic LangGraph concepts (nodes, edges, state) from earlier posts in this series.

How Do You Structure a Project for LangGraph Server?

If you’ve ever tried shipping a Python script and then realized it needs a proper folder layout, this will feel familiar. The server has to know two things: where your graph object lives, and what packages it depends on.

At a minimum, you need three files:

python
my-agent/
  ├── agent.py            # Your graph definition
  ├── langgraph.json      # Server configuration
  └── requirements.txt    # Python dependencies

Let me walk through each one. The config file langgraph.json is the entry point the server reads when it boots. Here’s the shortest version that works:

json
{
  "dependencies": ["."],
  "graphs": {
    "agent": "./agent.py:graph"
  },
  "env": ".env"
}

What do these fields mean? "graphs" is a name-to-path map. The key "agent" is the public name clients will use. The value "./agent.py:graph" tells the server to look for a variable called graph inside agent.py. "dependencies" says “install from the current directory.” And "env" points at a file holding secrets like API keys.

Next up: the agent itself. This one uses an LLM that can call a weather tool. It follows the ReAct loop — at each step the LLM either calls a tool or writes a final answer.

python
# agent.py
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # Simplified for demo — real apps call a weather API
    weather_data = {
        "London": "Cloudy, 15C",
        "Tokyo": "Sunny, 22C",
        "New York": "Rainy, 18C",
    }
    return weather_data.get(city, f"No data for {city}")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([get_weather])

def assistant(state: MessagesState):
    """Call the LLM with tool bindings."""
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

graph_builder = StateGraph(MessagesState)
graph_builder.add_node("assistant", assistant)
graph_builder.add_node("tools", ToolNode([get_weather]))
graph_builder.add_edge(START, "assistant")
graph_builder.add_conditional_edges("assistant", tools_condition)
graph_builder.add_edge("tools", "assistant")

graph = graph_builder.compile()

Here’s the flow: the assistant node asks the LLM what to do. If the LLM picks the weather tool, tools_condition steers the graph to the tools node. Once the tool finishes, control loops back to assistant. When the LLM answers without calling any tool, the graph wraps up.

Finally, the dependency file is just three lines:

python
langgraph>=0.4
langchain-openai>=0.3
langchain-core>=0.3

Tip: > Lock exact versions before you ship. Loose pins like >= work fine while learning, but in a live app you want langgraph==0.4.3 to avoid surprise breakage from upstream changes.

How Do You Start the Server on Your Machine?

You now have three files. Let’s fire things up. The CLI ships two commands: langgraph dev for fast, in-memory testing, and langgraph up for a Docker-backed setup that behaves like a real deployment.

My advice: start with langgraph dev. No Docker needed, no database to spin up. It launches a lightweight server you can hit right away.

bash
langgraph dev

You’ll see output like this:

python
Ready!
- API: http://127.0.0.1:2024
- Docs: http://127.0.0.1:2024/docs
- LangGraph Studio Web UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024

Just like that, your agent has an API on port 2024. The server parsed langgraph.json, found the graph, stood up every endpoint, and is now listening. Visit http://127.0.0.1:2024/docs to browse the full API reference.

If you want data that sticks around after a restart, use langgraph up instead:

bash
langgraph up

This one builds a Docker image and brings up a PostgreSQL container alongside it. The first build takes a bit, but after that your chats survive server restarts — exactly what you’d want in a staging or QA setting.

Warning: > langgraph dev stores everything in memory. The moment you stop the process, every thread, every checkpoint, and every chat log disappears. Keep it for coding and testing only.

Note: > What langgraph up actually does behind the curtain: It creates a Docker Compose stack with two services — your agent and a PostgreSQL instance. The server writes checkpoints to PostgreSQL, giving you the same data model you’d get in the cloud.

How Do You Call the Server Using the LangGraph SDK?

Your server is up. Now you need a way to send it messages. Sure, you could craft raw HTTP calls — every endpoint is a standard REST route. But the SDK handles auth headers, JSON encoding, and streaming plumbing in one neat package.

Here’s how you connect. The get_client call takes a URL and hands back a client object you’ll use for everything:

python
from langgraph_sdk import get_client

client = get_client(url="http://127.0.0.1:2024")

# Check which graphs are available
assistants = await client.assistants.search()
print(assistants)

You should see something like:

python
[{'assistant_id': 'agent', 'graph_id': 'agent', ...}]

Every entry in the "graphs" section of langgraph.json registers as an “assistant” on the server. The names line up one-to-one.

How Do Threads and Messages Work?

A “thread” is simply a container for one conversation. It stores the full message history and all related state. You open a thread, then push runs (that’s the API’s word for messages) into it.

When you call client.runs.stream, the SDK sends your message to the agent and yields events as the graph runs. Each event carries one piece of the reply:

python
# Create a new conversation thread
thread = await client.threads.create()
print(f"Thread ID: {thread['thread_id']}")

# Send a message and stream the response
async for event in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]},
):
    if event.event == "values":
        messages = event.data.get("messages", [])
        if messages:
            last = messages[-1]
            print(f"[{last['type']}]: {last.get('content', '')}")

The stream looks like this:

python
[ai]:
[tool]: Sunny, 22C
[ai]: The weather in Tokyo is currently sunny with a temperature of 22C.

Three events paint the full picture. First, the LLM chose to invoke the weather tool (you see an AI message with no text but tool-call data attached). Second, the tool ran and returned “Sunny, 22C.” Third, the LLM took that result and wrote a polished reply for the user.

Does the Server Remember Past Messages?

Yes — and you don’t have to lift a finger. Send a follow-up to the same thread without any history:

python
async for event in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "How about London?"}]},
):
    if event.event == "values":
        messages = event.data.get("messages", [])
        if messages:
            last = messages[-1]
            print(f"[{last['type']}]: {last.get('content', '')}")

Output:

python
[ai]:
[tool]: Cloudy, 15C
[ai]: The weather in London is currently cloudy with a temperature of 15C.

The agent knew “How about London?” was about weather because the server pulled the thread’s full history before running the graph. You sent a single new message. The server filled in the rest. That’s the same pattern used by every real-world chatbot.

Key Insight: > Your client never manages state. It sends a thread ID and a message. The server fetches the history, executes the graph with all the context, persists the new state, and pushes back the result. The client stays thin and stateless.

What Endpoints Come Built-In?

You don’t get a single /invoke route. The server ships with over 30 endpoints organized around five core resources:

ResourceWhat It ManagesKey Endpoints
AssistantsGraph configurationsPOST /assistants, GET /assistants/search
ThreadsConversation statePOST /threads, GET /threads/{id}/state
RunsGraph executionsPOST /runs, POST /runs/stream, POST /runs/wait
Cron JobsScheduled runsPOST /threads/{id}/runs/crons
StoreLong-term memoryPUT /store/items, POST /store/items/search

Let me zoom in on the three ways to kick off a run:

  • POST /runs — blocks until the graph is done. Good for fast queries that finish in seconds.
  • POST /runs/stream — feeds you events while the graph works. Ideal for chat interfaces.
  • POST /runs/wait — launches the graph in the background and lets you poll. Best when tasks need minutes.

Curious how this looks at the HTTP level? Here is the same chat done with curl:

bash
curl -X POST http://127.0.0.1:2024/threads \
  -H "Content-Type: application/json" \
  -d '{}'

Response:

json
{"thread_id": "abc123-...", "created_at": "...", "metadata": {}}

Then trigger a run on that thread:

bash
curl -X POST http://127.0.0.1:2024/threads/abc123/runs/wait \
  -H "Content-Type: application/json" \
  -d '{
    "assistant_id": "agent",
    "input": {"messages": [{"role": "user", "content": "Weather in New York?"}]}
  }'

The JSON that comes back has the whole graph output — tool calls, tool results, and the final answer. The SDK wraps these exact HTTP calls in friendlier methods.

How Do Background Runs Help with Slow Tasks?

Picture an agent that needs several minutes to dig through sources and draft a report. Holding an HTTP connection open that long is asking for timeouts.

LangGraph Server offers background runs for exactly this. You fire off the task, receive a run ID on the spot, and come back later to grab the result. The server feeds your graph into a task queue and handles it behind the scenes.

client.runs.create starts a background run and returns right away:

python
run = await client.runs.create(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "What's the weather in all three cities?"}]},
)
print(f"Run ID: {run['run_id']}")
print(f"Status: {run['status']}")

Instant response:

python
Run ID: run-456...
Status: pending

Later, check progress:

python
run_status = await client.runs.get(
    thread_id=thread["thread_id"],
    run_id=run["run_id"],
)
print(f"Status: {run_status['status']}")
python
Status: success

Once the status flips to success, read the thread state to get the answer. This is the right pattern for any workload where tying up a web server thread while the agent thinks is not an option.

Tip: > Skip manual polling — use client.runs.join. Calling await client.runs.join(thread_id, run_id) waits until the run wraps up. It polls under the hood so you don’t need retry loops of your own.

What Are the Ways to Go Live — Local, Cloud, and Self-Hosted?

We’ve been running on localhost. When it’s time to serve real users, you have four paths. Each trades simplicity for control:

OptionWhere It RunsBest ForState StorageCost
langgraph devYour laptopCoding and testingRAM onlyFree
Cloud SaaSLangSmith serversQuick launch, small teamsManaged PostgreSQLPay as you go
BYOCYour AWS/GCP VPCStrict data rulesYour own databaseLicense fee
Self-HostedYour own machinesTotal controlYour own databaseLicense fee

Cloud Path

Cloud SaaS is the shortest route to a live URL. Your code sits in a GitHub repo. The platform builds and ships it for you.

Four steps and you’re done:

  1. Push your project (including langgraph.json) to GitHub.
  2. Connect the repo inside the LangSmith dashboard.
  3. Fill in your API keys under deployment settings.
  4. Hit deploy.

The platform creates a Docker image, sets up PostgreSQL, and hands you a URL. Your client code changes in exactly two spots:

python
cloud_client = get_client(
    url="https://your-deployment-id.us.langgraph.app",
    api_key="your-langsmith-api-key",
)

# Everything else is identical
thread = await cloud_client.threads.create()
async for event in cloud_client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "Weather in Tokyo?"}]},
):
    print(event.data)

This is the SDK’s biggest selling point. Same code, swap the URL. Whether you point at localhost or the cloud, the client works the same way.

Self-Hosted with Docker

Want to run everything on your own servers? Build a container image from your project:

bash
langgraph build -t my-agent-server

That image is fully portable. Push it to any registry and deploy on Kubernetes, ECS, Cloud Run — anything that speaks Docker.

The server expects a PostgreSQL instance for state. Hand it the connection string through an env var:

bash
docker run -p 8123:8000 \
  -e OPENAI_API_KEY="your-key" \
  -e DATABASE_URI="postgresql://user:pass@host:5432/langgraph" \
  my-agent-server

Warning: > Never bake secrets into your Docker images. Rely on env vars, a secrets manager like AWS Secrets Manager or HashiCorp Vault, or Kubernetes secrets. A key baked into an image layer is a breach waiting to happen.

How Do Assistants Let You Tweak Behavior Without Redeploying?

Sooner or later you’ll face this: you want to A/B test two system prompts. Or your customer team wants a friendly tone while the dev-facing API keeps things brief.

Assistants are the answer. An assistant is a named profile for your graph. Same code underneath, different settings on top.

Create one with client.assistants.create. The config dictionary carries the settings your graph reads at runtime:

python
assistant = await client.assistants.create(
    graph_id="agent",
    config={
        "configurable": {
            "system_prompt": "You are a concise weather bot. Temperature in Celsius and Fahrenheit."
        }
    },
    name="weather-expert",
)
print(f"Assistant ID: {assistant['assistant_id']}")
python
Assistant ID: asst-789...

Want to adjust the prompt a week later? Hit the API — zero downtime, zero deploys:

python
updated = await client.assistants.update(
    assistant_id=assistant["assistant_id"],
    config={
        "configurable": {
            "system_prompt": "You are a detailed weather bot. Include temperature, humidity, and wind."
        }
    },
)

This is how prompt work actually plays out in live systems. You push code once and then refine the wording through API calls. I find this far better than redeploying every time a prompt changes — it’s quicker and lower risk.

Tip: > Use assistants to separate use cases. One graph, many profiles: “support-bot” for end users, “dev-bot” for your team, “test-bot” for QA. Each carries its own system prompt, model pick, and tool list — without copying a single line of code.

How Do You Pick the Right Streaming Mode?

Different consumers want different levels of detail. A chatbot needs to show words appearing one at a time. A monitoring dashboard only cares about state diffs. A debugger wants the full picture. The server gives you four modes to choose from:

ModeWhat You ReceiveTypical Use
valuesFull state snapshot after each nodeDebugging and auditing
messagesIndividual LLM tokens as they’re producedChat UIs
updatesOnly the fields each node changedLive dashboards
eventsLow-level LangGraph eventsCustom pipeline logic

For a chat interface, go with messages. It pushes each token the moment the LLM writes it — the live-typing feel you know from ChatGPT:

python
async for event in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "Tell me about Tokyo's weather"}]},
    stream_mode="messages",
):
    if hasattr(event, 'data') and event.data:
        if isinstance(event.data, dict) and "content" in event.data:
            print(event.data["content"], end="", flush=True)

Every print renders one token. On the frontend, text appears word by word.

When something goes wrong, switch to values mode. It shows the full state after every node:

python
async for event in client.runs.stream(
    thread_id=thread["thread_id"],
    assistant_id="agent",
    input={"messages": [{"role": "user", "content": "Weather in London?"}]},
    stream_mode="values",
):
    if event.event == "values":
        msgs = event.data.get("messages", [])
        print(f"--- {len(msgs)} messages in state ---")
        for m in msgs:
            print(f"  [{m['type']}]: {m.get('content', '[tool call]')[:60]}")

Now you watch the state grow at each step — user message, LLM tool request, tool reply, final answer. When a run goes sideways, this view pinpoints exactly where things went off track.

Common Mistakes and How to Fix Them

Mistake 1: Wrong Graph Variable Name in langgraph.json

json
{
  "graphs": {
    "agent": "./agent.py:app"
  }
}

Why it breaks: Your code defines graph = graph_builder.compile(), but the config references :app. The server tries to import an object that doesn’t exist and throws an error.

The fix: Make the name after the colon match your actual variable:

json
{
  "graphs": {
    "agent": "./agent.py:graph"
  }
}

Mistake 2: Running langgraph dev in Production

bash
# This will lose user data on every restart
langgraph dev --host 0.0.0.0

Why it breaks: Everything lives in RAM. The next restart wipes every thread and every conversation. Users lose all their history.

The fix:

bash
langgraph up

Mistake 3: Missing Environment Variables

bash
langgraph dev
# Server starts fine, but first request crashes:
# ERROR: OPENAI_API_KEY not set

Why it fools you: The server boots without complaint even when keys are missing. The crash only shows up when the first real request reaches the LLM. Inside a Docker container, this is even trickier to spot.

The fix: Put a .env file in your project and reference it from langgraph.json:

python
# .env
OPENAI_API_KEY=sk-your-key-here
LANGSMITH_API_KEY=your-langsmith-key

Warning: > Make sure .env is in .gitignore. When the secrets file sits next to langgraph.json, it’s dangerously easy to commit it. Add the exclusion before your very first commit.

Exercise 1: Deploy and Query Your Agent

Time to wire everything up yourself. Launch the weather agent locally, then hold a two-turn chat using the SDK.

typescript
{
  type: 'exercise',
  id: 'deploy-query-agent',
  title: 'Exercise 1: Deploy and Query Your Agent',
  difficulty: 'advanced',
  exerciseType: 'write',
  instructions: 'Using the LangGraph SDK, create a thread, ask about the weather in Tokyo, then follow up by asking about London on the SAME thread. Print both AI responses. The agent should call the get_weather tool for each city.',
  starterCode: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\n# Step 1: Create a thread\nthread = await client.threads.create()\n\n# Step 2: Send first message about Tokyo\n# YOUR CODE HERE — use client.runs.stream\n\n# Step 3: Send follow-up about London on the SAME thread\n# YOUR CODE HERE\n\nprint("DONE")',
  testCases: [
    { id: 'tc1', input: 'print(thread["thread_id"][:4])', expectedOutput: 'DONE', hidden: true, description: 'Thread created' },
    { id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Both queries complete' },
  ],
  hints: [
    'Use client.runs.stream with thread_id=thread["thread_id"], assistant_id="agent", and input={"messages": [{"role": "user", "content": "..."}]}',
    'For the follow-up, use the same thread_id. The server loads conversation history automatically.',
  ],
  solution: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\nthread = await client.threads.create()\n\nasync for event in client.runs.stream(\n    thread_id=thread["thread_id"],\n    assistant_id="agent",\n    input={"messages": [{"role": "user", "content": "What is the weather in Tokyo?"}]},\n    stream_mode="values",\n):\n    if event.event == "values":\n        msgs = event.data.get("messages", [])\n        if msgs:\n            print(f"Tokyo: {msgs[-1].get(\'content\', \'\')}")\n\nasync for event in client.runs.stream(\n    thread_id=thread["thread_id"],\n    assistant_id="agent",\n    input={"messages": [{"role": "user", "content": "How about London?"}]},\n    stream_mode="values",\n):\n    if event.event == "values":\n        msgs = event.data.get("messages", [])\n        if msgs:\n            print(f"London: {msgs[-1].get(\'content\', \'\')}")\n\nprint("DONE")',
  solutionExplanation: 'Reusing the same thread_id is the key. The server loads full conversation history, so the agent understands "How about London?" refers to weather from the previous message.',
  xpReward: 20,
}

Exercise 2: Create a Custom Assistant

Now build your own assistant with a different personality.

typescript
{
  type: 'exercise',
  id: 'create-custom-assistant',
  title: 'Exercise 2: Create a Custom Assistant',
  difficulty: 'advanced',
  exerciseType: 'write',
  instructions: 'Create a new assistant named "brief-weather-bot" from the "agent" graph. Give it a system prompt: "Reply with just the city and temperature, nothing else." Then query it about New York weather.',
  starterCode: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\n# Step 1: Create a custom assistant\nassistant = await client.assistants.create(\n    # YOUR CODE HERE\n)\n\n# Step 2: Create a thread and query YOUR assistant\nthread = await client.threads.create()\n# YOUR CODE HERE\n\nprint("DONE")',
  testCases: [
    { id: 'tc1', input: 'print(assistant["name"])', expectedOutput: 'brief-weather-bot', description: 'Name matches' },
    { id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Query completes' },
  ],
  hints: [
    'assistants.create needs graph_id="agent", name="brief-weather-bot", and config={"configurable": {"system_prompt": "..."}}.',
    'In client.runs.stream, use assistant["assistant_id"] — not "agent" — as the assistant_id.',
  ],
  solution: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\nassistant = await client.assistants.create(\n    graph_id="agent",\n    name="brief-weather-bot",\n    config={"configurable": {"system_prompt": "Reply with just the city and temperature, nothing else."}},\n)\n\nthread = await client.threads.create()\nasync for event in client.runs.stream(\n    thread_id=thread["thread_id"],\n    assistant_id=assistant["assistant_id"],\n    input={"messages": [{"role": "user", "content": "Weather in New York?"}]},\n    stream_mode="values",\n):\n    if event.event == "values":\n        msgs = event.data.get("messages", [])\n        if msgs:\n            print(msgs[-1].get("content", ""))\n\nprint("DONE")',
  solutionExplanation: 'Creating an assistant from the same graph gives you different behavior without touching graph code. The assistant_id from the response replaces the default "agent" name in your run calls.',
  xpReward: 20,
}

When Should You NOT Use LangGraph Platform?

Powerful as it is, the platform is overkill in a few situations.

One-shot chains with no memory. If your workflow is just “prompt in, answer out” with no state and no tools, a bare FastAPI endpoint does the job at lower cost.

Microsecond-sensitive paths. The server adds overhead for state management, checkpoints, and the HTTP layer. When every millisecond counts, call the LLM directly.

Tip: > Use langgraph dev as a quick smoke test. If the local server covers what you need, then choose a hosting path. The free self-hosted tier covers up to 1 million node runs.

Heavy investment in another framework. If your team already runs CrewAI or AutoGen, switching just for the hosting layer isn’t worth the migration cost. Reach for BentoML or a FastAPI wrapper instead.

Avoiding vendor lock-in. Cloud SaaS ties you to LangChain’s servers. If that’s a concern, grab the self-hosted Docker option — it runs fully on your own gear.

Complete Code

Click to expand the full project files (copy-paste and run)
python
# agent.py
# Complete code from: LangGraph Platform — Deploying Agents as APIs
# Requires: pip install langgraph langchain-openai langchain-core
# Python 3.10+

from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    weather_data = {
        "London": "Cloudy, 15C",
        "Tokyo": "Sunny, 22C",
        "New York": "Rainy, 18C",
    }
    return weather_data.get(city, f"No data for {city}")

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([get_weather])

def assistant(state: MessagesState):
    """Call the LLM with tool bindings."""
    return {"messages": [llm_with_tools.invoke(state["messages"])]}

graph_builder = StateGraph(MessagesState)
graph_builder.add_node("assistant", assistant)
graph_builder.add_node("tools", ToolNode([get_weather]))
graph_builder.add_edge(START, "assistant")
graph_builder.add_conditional_edges("assistant", tools_condition)
graph_builder.add_edge("tools", "assistant")

graph = graph_builder.compile()
json
// langgraph.json
{
  "dependencies": ["."],
  "graphs": {
    "agent": "./agent.py:graph"
  },
  "env": ".env"
}
python
# client.py — Interact with the deployed agent
# Requires: pip install langgraph-sdk
# Start the server first: langgraph dev

import asyncio
from langgraph_sdk import get_client

async def main():
    client = get_client(url="http://127.0.0.1:2024")

    thread = await client.threads.create()
    print(f"Thread: {thread['thread_id']}")

    # First message
    async for event in client.runs.stream(
        thread_id=thread["thread_id"],
        assistant_id="agent",
        input={"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]},
        stream_mode="values",
    ):
        if event.event == "values":
            msgs = event.data.get("messages", [])
            if msgs:
                print(f"[{msgs[-1]['type']}]: {msgs[-1].get('content', '')}")

    # Follow-up on same thread
    async for event in client.runs.stream(
        thread_id=thread["thread_id"],
        assistant_id="agent",
        input={"messages": [{"role": "user", "content": "How about London?"}]},
        stream_mode="values",
    ):
        if event.event == "values":
            msgs = event.data.get("messages", [])
            if msgs:
                print(f"[{msgs[-1]['type']}]: {msgs[-1].get('content', '')}")

asyncio.run(main())

Summary

LangGraph Platform takes a graph you built in a notebook and makes it available as a live API. Point langgraph.json at your graph, run langgraph dev, and you instantly get a server with streaming, state storage, and thread management built in.

The SDK keeps things simple on the client side. Create threads, send messages, stream answers — and the exact same code works whether you’re hitting localhost or a cloud URL. For slow workloads, background runs offload the wait. For prompt experiments, assistants let you change behavior on the fly without pushing new code.

Practice exercise: Extend the weather agent with a get_stock_price tool. Deploy it using langgraph dev. Spin up two assistants — one that gives short answers and one that gives detailed answers. Ask both the same question and compare what each returns.

Click to see the solution outline

1. Add a `get_stock_price` tool with the `@tool` decorator and mock data.
2. Bind both tools: `llm.bind_tools([get_weather, get_stock_price])`.
3. Update `ToolNode` to include both tools.
4. Deploy with `langgraph dev`.
5. Create two assistants via `client.assistants.create()` with different prompts.
6. Create threads for each, send the same question, and compare.

Frequently Asked Questions

Can I use LangGraph Server without LangSmith?

Yes. The self-hosted options — Docker through langgraph up or a custom image from langgraph build — run entirely on your own machines with no LangSmith tie. You only need LangSmith for cloud SaaS hosting and the Studio web UI.

Does the server support JavaScript agents?

The server itself runs Python agents only. However, the SDK client ships in both Python and JavaScript flavors. So your agent logic stays in Python while your Node.js frontend talks to it through the JS SDK: import { Client } from "@langchain/langgraph-sdk".

How does pricing work?

The free self-hosted tier covers up to 1 million node runs. Cloud SaaS follows a pay-as-you-go model through LangSmith plans. BYOC and enterprise tiers need a license. See LangChain’s pricing page for current numbers.

Can I run multiple agents on one server?

Absolutely. Add more entries under "graphs" in langgraph.json. Each one becomes its own named assistant. Clients address each by name — a clean way to host several focused agents on shared hardware.

json
{
  "graphs": {
    "weather-agent": "./agents/weather.py:graph",
    "support-agent": "./agents/support.py:graph"
  }
}

References

  1. LangGraph Platform documentation — Deployment quickstart. Link
  2. LangGraph Platform GA announcement — LangChain Blog. Link
  3. LangGraph SDK — Python package on PyPI. Link
  4. LangGraph Platform API reference. Link
  5. LangGraph GitHub repository — source code and examples. Link
  6. Why LangGraph Platform for agent deployment — LangChain Blog. Link
  7. LangGraph local server documentation. Link
  8. LangSmith deployment infrastructure. Link
Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Free Callback - Limited Slots
Not Sure Which Course to Start With?
Talk to our AI Counsellors and Practitioners. We'll help you clear all your questions for your background and goals, bridging the gap between your current skills and a career in AI.
10-digit mobile number
📞
Thank You!
We'll Call You Soon!
Our learning advisor will reach out within 24 hours.
(Check your inbox too — we've sent a confirmation)
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science