LangGraph Platform — Deploying Agents as APIs with LangGraph Server
You’ve built a LangGraph agent. It works on your laptop. But how do you let a frontend app, a Slack bot, or another service call it? You can’t ask users to import your graph and run graph.invoke() locally.
That’s the gap LangGraph Platform fills. It wraps your graph in an API server with built-in endpoints for streaming, state management, and long-running tasks. By the end of this article, you’ll deploy an agent as an API and call it from a client.
Here’s the big picture before we touch any code.
Your LangGraph graph is the brain. LangGraph Server wraps that brain in an HTTP API. It gives you REST endpoints for creating threads, sending messages, and streaming responses. The SDK client is a Python library that talks to those endpoints so you skip raw HTTP calls. And LangGraph Platform is the umbrella — server, CLI, Studio (a visual debugger), and cloud hosting combined.
The data flow goes like this. A client sends a request to the server. The server loads the right graph, runs it with the input, and persists state in a database. Then it streams the response back. Every run gets checkpointed automatically. Your agent handles long-running tasks, survives restarts, and picks up where it left off.
What Is LangGraph Platform?
LangGraph Platform is the deployment layer for LangGraph agents. It takes a graph you’ve built and turns it into a production service. No FastAPI routes to write, no database to manage, no streaming infrastructure to build.
The platform has four pieces:
- LangGraph Server — an API server with 30+ endpoints for threads, runs, streaming, assistants, and cron jobs.
- LangGraph SDK — Python and JavaScript client libraries for talking to the server.
- LangGraph CLI — a command-line tool for building, testing, and deploying locally.
- LangGraph Studio — a visual IDE for testing and debugging graphs interactively.
Why does this matter? Because deploying an agent isn’t the same as deploying a REST API. Agents need persistent state across requests. They need to stream tokens. They run tasks that take minutes, not milliseconds. Solving each of these yourself is months of work.
Key Insight: > LangGraph Platform isn’t just “hosting.” It solves the hard infrastructure problems — persistent state, long-running background tasks, token-by-token streaming, and horizontal scaling — that you’d otherwise build yourself.
Prerequisites
- Python version: 3.10+
- Required libraries: langgraph (0.4+), langgraph-sdk (0.1.51+), langchain-openai (0.3+), langchain-core (0.3+)
- Install:
pip install langgraph langgraph-cli langgraph-sdk langchain-openai langchain-core - API key: An OpenAI API key set as
OPENAI_API_KEY. See OpenAI’s docs to create one. - Docker: Required for
langgraph up. Install from docker.com. - Time to complete: ~40 minutes
- Prior knowledge: Basic LangGraph concepts (nodes, edges, state) from earlier posts in this series.
Setting Up a LangGraph Project for Deployment
Ever tried deploying a Python script and realized you need a whole project structure first? Same thing here. LangGraph Server needs to know where your graph lives and what dependencies to install.
Here’s the minimum folder layout:
my-agent/
├── agent.py # Your graph definition
├── langgraph.json # Server configuration
└── requirements.txt # Python dependencies
The langgraph.json file is what the server reads on startup. It maps graph names to Python import paths. Here’s a minimal config:
{
"dependencies": ["."],
"graphs": {
"agent": "./agent.py:graph"
},
"env": ".env"
}
Three fields, three jobs. The "graphs" field maps the name "agent" to a Python path — "./agent.py:graph" means “import the graph variable from agent.py.” The "dependencies" field says “install packages from the current directory.” And "env" points to your environment variables file.
Now we need the actual agent code. This graph uses an LLM with tool-calling. The agent answers questions and calls a weather tool when needed. It follows the ReAct pattern — the LLM decides whether to call a tool or respond directly.
# agent.py
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# Simplified for demo — real apps call a weather API
weather_data = {
"London": "Cloudy, 15C",
"Tokyo": "Sunny, 22C",
"New York": "Rainy, 18C",
}
return weather_data.get(city, f"No data for {city}")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([get_weather])
def assistant(state: MessagesState):
"""Call the LLM with tool bindings."""
return {"messages": [llm_with_tools.invoke(state["messages"])]}
graph_builder = StateGraph(MessagesState)
graph_builder.add_node("assistant", assistant)
graph_builder.add_node("tools", ToolNode([get_weather]))
graph_builder.add_edge(START, "assistant")
graph_builder.add_conditional_edges("assistant", tools_condition)
graph_builder.add_edge("tools", "assistant")
graph = graph_builder.compile()
The assistant node calls the LLM. If the LLM wants to use a tool, tools_condition routes to the tools node. After the tool runs, control returns to the assistant. When the LLM responds without a tool call, the graph ends.
And the requirements file keeps it simple:
langgraph>=0.4
langchain-openai>=0.3
langchain-core>=0.3
Tip: > Pin your dependency versions in production. Using>=is fine for tutorials, but production apps should lock exact versions likelanggraph==0.4.3. One upstream breaking change can take down your agent.
Running LangGraph Server Locally
You’ve got three files. Time to see if they actually work. The CLI gives you two commands: langgraph dev for quick testing, and langgraph up for a Docker-based setup that mirrors production.
I’d recommend starting with langgraph dev. It’s the fastest path — no Docker, no database. It starts an in-memory server that’s perfect for trying things out.
langgraph dev
The output tells you everything you need:
Ready!
- API: http://127.0.0.1:2024
- Docs: http://127.0.0.1:2024/docs
- LangGraph Studio Web UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
Your agent is now an API on port 2024. The server read langgraph.json, found your graph, set up all the endpoints, and started listening. Open http://127.0.0.1:2024/docs in a browser to see the full API documentation.
Want persistent state that survives restarts? Switch to langgraph up:
langgraph up
This builds a Docker image and starts PostgreSQL for state storage. It takes longer the first time, but your agent’s conversations survive restarts. That’s the setup you’d use for staging.
Warning: >langgraph devstores everything in memory. When the process stops, all threads, checkpoints, and conversation history vanish. Use it for development only.Note: > Whatlanggraph upactually runs: Behind the scenes, it creates a Docker Compose stack with two containers — your agent server and a PostgreSQL database. The server connects to PostgreSQL for checkpoint storage, giving you the same persistence model as a cloud deployment.
Talking to the Server with the LangGraph SDK
The server is up. How do you send it messages? You could fire raw HTTP requests — the server exposes a full REST API. But the LangGraph SDK handles authentication, serialization, and streaming for you.
Here’s the SDK client connecting to your local server. The get_client function takes a URL and returns a client you’ll use for everything.
from langgraph_sdk import get_client
client = get_client(url="http://127.0.0.1:2024")
# Check which graphs are available
assistants = await client.assistants.search()
print(assistants)
The response confirms your agent is registered:
[{'assistant_id': 'agent', 'graph_id': 'agent', ...}]
Every graph in langgraph.json shows up as an “assistant” on the server. The names match.
Creating Threads and Sending Messages
What’s a “thread”? It’s a conversation container. It holds all the messages and state for one interaction. You create a thread, then send runs (messages) to it.
The client.runs.stream method sends a message to your agent and yields events as the graph executes. Each event carries a piece of the response.
# Create a new conversation thread
thread = await client.threads.create()
print(f"Thread ID: {thread['thread_id']}")
# Send a message and stream the response
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]},
):
if event.event == "values":
messages = event.data.get("messages", [])
if messages:
last = messages[-1]
print(f"[{last['type']}]: {last.get('content', '')}")
Here’s what streams back:
[ai]:
[tool]: Sunny, 22C
[ai]: The weather in Tokyo is currently sunny with a temperature of 22C.
Three events tell the story. The LLM decided to call the weather tool (empty AI message with tool call metadata). The tool returned “Sunny, 22C.” Then the LLM composed a natural response from the tool result.
Thread State Persists Automatically
Here’s where it gets interesting. Send a follow-up on the same thread — without repeating any context:
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "How about London?"}]},
):
if event.event == "values":
messages = event.data.get("messages", [])
if messages:
last = messages[-1]
print(f"[{last['type']}]: {last.get('content', '')}")
And you get:
[ai]:
[tool]: Cloudy, 15C
[ai]: The weather in London is currently cloudy with a temperature of 15C.
The agent understood “How about London?” because the server loaded the thread’s full history before running the graph. You didn’t pass previous messages. The server handled that. This is how production chatbots work.
Key Insight: > The server manages state, not your client. Your client sends a message and a thread ID. The server loads history, runs the graph with full context, saves the updated state, and streams the result. Your client stays stateless.
LangGraph Platform API Endpoints — What You Get for Free
You don’t get a single “invoke” endpoint. You get 30+ endpoints organized around five resources. Here’s the map:
| Resource | What It Manages | Key Endpoints |
|---|---|---|
| Assistants | Graph configurations | POST /assistants, GET /assistants/search |
| Threads | Conversation state | POST /threads, GET /threads/{id}/state |
| Runs | Graph executions | POST /runs, POST /runs/stream, POST /runs/wait |
| Cron Jobs | Scheduled runs | POST /threads/{id}/runs/crons |
| Store | Long-term memory | PUT /store/items, POST /store/items/search |
The three run modes deserve a closer look:
POST /runs— synchronous. Waits for the result. Best for quick queries under 30 seconds.POST /runs/stream— streaming. Yields events as the graph executes. Best for chatbots.POST /runs/wait— background with polling. Best for tasks that take minutes.
Want to see what’s happening under the hood? Here’s the same interaction via curl:
curl -X POST http://127.0.0.1:2024/threads \
-H "Content-Type: application/json" \
-d '{}'
Response:
{"thread_id": "abc123-...", "created_at": "...", "metadata": {}}
Then invoke the agent on that thread:
curl -X POST http://127.0.0.1:2024/threads/abc123/runs/wait \
-H "Content-Type: application/json" \
-d '{
"assistant_id": "agent",
"input": {"messages": [{"role": "user", "content": "Weather in New York?"}]}
}'
The response contains the full graph output — tool calls, tool results, and the final answer. The SDK wraps exactly these HTTP calls in a cleaner interface.
Background Runs for Long-Running Tasks
What if your agent needs five minutes to research a topic and write a report? You don’t want an HTTP connection hanging that long.
LangGraph Server solves this with background runs. You kick off the run, get a run ID immediately, and check back later. The server executes your graph in a task queue behind the scenes.
The client.runs.create method starts a background run and returns right away with the run’s metadata:
run = await client.runs.create(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "What's the weather in all three cities?"}]},
)
print(f"Run ID: {run['run_id']}")
print(f"Status: {run['status']}")
Immediate response:
Run ID: run-456...
Status: pending
Check on it later:
run_status = await client.runs.get(
thread_id=thread["thread_id"],
run_id=run["run_id"],
)
print(f"Status: {run_status['status']}")
Status: success
Once the status is success, read the thread state for the result. This is how you handle production workloads where blocking a web server thread on agent execution isn’t an option.
Tip: > Useclient.runs.joininstead of manual polling. The callawait client.runs.join(thread_id, run_id)blocks until the run finishes. It polls internally so you don’t write retry logic yourself.
LangGraph Platform Deployment Options — Local, Cloud, and Self-Hosted
We’ve run the server locally. For production, you pick from four options. Each trades convenience for control.
| Option | Where It Runs | Best For | State Persistence | Cost |
|---|---|---|---|---|
langgraph dev |
Your machine | Development | In-memory only | Free |
| Cloud SaaS | LangSmith infrastructure | Fast deploy, small teams | Managed PostgreSQL | Usage-based |
| BYOC | Your AWS/GCP VPC | Data residency needs | Your database | License |
| Self-Hosted | Your infrastructure | Maximum control | Your database | License |
Cloud Deployment
Cloud SaaS is the fastest path to production. Your code lives in GitHub. The platform builds and deploys it.
The steps:
- Push your project (with
langgraph.json) to GitHub. - Connect the repo in the LangSmith console.
- Set environment variables (API keys) in deployment settings.
- Click deploy.
The platform builds a Docker image, provisions PostgreSQL, and gives you a URL. Your SDK client code changes by exactly two fields:
cloud_client = get_client(
url="https://your-deployment-id.us.langgraph.app",
api_key="your-langsmith-api-key",
)
# Everything else is identical
thread = await cloud_client.threads.create()
async for event in cloud_client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "Weather in Tokyo?"}]},
):
print(event.data)
That’s the SDK’s big win. Same code, different URL. Local or cloud — the client doesn’t care.
Self-Hosted with Docker
Need to run on your own servers? Build a Docker image from your project:
langgraph build -t my-agent-server
This gives you a portable image. Push it to any container registry. Deploy on Kubernetes, ECS, Cloud Run — anywhere Docker runs.
The server needs PostgreSQL for state. Pass the connection string as an environment variable:
docker run -p 8123:8000 \
-e OPENAI_API_KEY="your-key" \
-e DATABASE_URI="postgresql://user:pass@host:5432/langgraph" \
my-agent-server
Warning: > Never bake API keys into Docker images. Use environment variables, secrets managers (AWS Secrets Manager, Vault), or Kubernetes secrets. A leaked key in a container image is a production incident.
Assistants — Versioning Without Redeploying
Here’s a scenario you’ll hit quickly in production. You want to A/B test two system prompts. Or your customer success team wants a friendlier tone while the developer API stays technical.
Assistants solve this. An assistant is a named configuration of your graph. Same graph code, different behavior.
Create a custom assistant with client.assistants.create. The config parameter passes overrides your graph reads at runtime:
assistant = await client.assistants.create(
graph_id="agent",
config={
"configurable": {
"system_prompt": "You are a concise weather bot. Temperature in Celsius and Fahrenheit."
}
},
name="weather-expert",
)
print(f"Assistant ID: {assistant['assistant_id']}")
Assistant ID: asst-789...
Need to tweak the prompt later? Update it through the API — no redeployment:
updated = await client.assistants.update(
assistant_id=assistant["assistant_id"],
config={
"configurable": {
"system_prompt": "You are a detailed weather bot. Include temperature, humidity, and wind."
}
},
)
This is how prompt engineering works in production. Push code once. Refine behavior through the API. I prefer this over redeploying for every prompt change — it’s faster and lower risk.
Tip: > Use assistants to separate concerns. One graph, multiple assistants: a “support-bot” for customers, a “dev-bot” for internal use, a “test-bot” for QA. Each has its own system prompt, model choice, and tool config. Zero code duplication.
Streaming Modes — Pick Your Granularity
Not all streaming is equal. A chatbot needs token-by-token output. A monitoring dashboard wants state changes. A debugger wants everything. The server supports four modes:
| Mode | What Streams | Use Case |
|---|---|---|
values |
Full state after each node | Debugging, full visibility |
messages |
LLM tokens one-by-one | Chatbot UIs |
updates |
Only changes per node | Monitoring dashboards |
events |
Internal LangGraph events | Advanced custom logic |
For a chatbot, use messages mode. It streams individual tokens as the LLM generates them — the same experience as ChatGPT:
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "Tell me about Tokyo's weather"}]},
stream_mode="messages",
):
if hasattr(event, 'data') and event.data:
if isinstance(event.data, dict) and "content" in event.data:
print(event.data["content"], end="", flush=True)
Each call to print renders a single token. Your frontend shows text appearing word by word.
For debugging, switch to values mode. It dumps the full state after each node runs:
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "Weather in London?"}]},
stream_mode="values",
):
if event.event == "values":
msgs = event.data.get("messages", [])
print(f"--- {len(msgs)} messages in state ---")
for m in msgs:
print(f" [{m['type']}]: {m.get('content', '[tool call]')[:60]}")
You see the state growing step by step — user message, LLM tool call, tool response, final answer. When something goes wrong, this is how you find it.
Common Mistakes and How to Fix Them
Mistake 1: Wrong Graph Variable Name in langgraph.json
{
"graphs": {
"agent": "./agent.py:app"
}
}
Why it breaks: Your file has graph = graph_builder.compile() but the config says :app. The server can’t find the object and throws an import error.
The fix: Match the name after the colon to your actual variable:
{
"graphs": {
"agent": "./agent.py:graph"
}
}
Mistake 2: Running langgraph dev in Production
# This will lose user data on every restart
langgraph dev --host 0.0.0.0
Why it breaks: In-memory storage. Every restart wipes all threads and conversations. Your users lose everything.
The fix:
langgraph up
Mistake 3: Missing Environment Variables
langgraph dev
# Server starts fine, but first request crashes:
# ERROR: OPENAI_API_KEY not set
Why it’s confusing: The server boots without API keys. It only fails when the first request reaches the LLM. In Docker, this is even harder to debug.
The fix: Create a .env file and reference it in langgraph.json:
# .env
OPENAI_API_KEY=sk-your-key-here
LANGSMITH_API_KEY=your-langsmith-key
Warning: > Check your.envfile is in.gitignore. It’s easy to commit API keys to your repo when the.envfile sits next tolanggraph.json. Add.envto.gitignorebefore your first commit.
Exercise 1: Deploy and Query Your Agent
You’ve seen the pieces. Put them together yourself. Deploy the weather agent locally and run a multi-turn conversation.
{
type: 'exercise',
id: 'deploy-query-agent',
title: 'Exercise 1: Deploy and Query Your Agent',
difficulty: 'advanced',
exerciseType: 'write',
instructions: 'Using the LangGraph SDK, create a thread, ask about the weather in Tokyo, then follow up by asking about London on the SAME thread. Print both AI responses. The agent should call the get_weather tool for each city.',
starterCode: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\n# Step 1: Create a thread\nthread = await client.threads.create()\n\n# Step 2: Send first message about Tokyo\n# YOUR CODE HERE — use client.runs.stream\n\n# Step 3: Send follow-up about London on the SAME thread\n# YOUR CODE HERE\n\nprint("DONE")',
testCases: [
{ id: 'tc1', input: 'print(thread["thread_id"][:4])', expectedOutput: 'DONE', hidden: true, description: 'Thread created' },
{ id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Both queries complete' },
],
hints: [
'Use client.runs.stream with thread_id=thread["thread_id"], assistant_id="agent", and input={"messages": [{"role": "user", "content": "..."}]}',
'For the follow-up, use the same thread_id. The server loads conversation history automatically.',
],
solution: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\nthread = await client.threads.create()\n\nasync for event in client.runs.stream(\n thread_id=thread["thread_id"],\n assistant_id="agent",\n input={"messages": [{"role": "user", "content": "What is the weather in Tokyo?"}]},\n stream_mode="values",\n):\n if event.event == "values":\n msgs = event.data.get("messages", [])\n if msgs:\n print(f"Tokyo: {msgs[-1].get(\'content\', \'\')}")\n\nasync for event in client.runs.stream(\n thread_id=thread["thread_id"],\n assistant_id="agent",\n input={"messages": [{"role": "user", "content": "How about London?"}]},\n stream_mode="values",\n):\n if event.event == "values":\n msgs = event.data.get("messages", [])\n if msgs:\n print(f"London: {msgs[-1].get(\'content\', \'\')}")\n\nprint("DONE")',
solutionExplanation: 'Reusing the same thread_id is the key. The server loads full conversation history, so the agent understands "How about London?" refers to weather from the previous message.',
xpReward: 20,
}
Exercise 2: Create a Custom Assistant
Now try creating your own assistant with a custom personality.
{
type: 'exercise',
id: 'create-custom-assistant',
title: 'Exercise 2: Create a Custom Assistant',
difficulty: 'advanced',
exerciseType: 'write',
instructions: 'Create a new assistant named "brief-weather-bot" from the "agent" graph. Give it a system prompt: "Reply with just the city and temperature, nothing else." Then query it about New York weather.',
starterCode: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\n# Step 1: Create a custom assistant\nassistant = await client.assistants.create(\n # YOUR CODE HERE\n)\n\n# Step 2: Create a thread and query YOUR assistant\nthread = await client.threads.create()\n# YOUR CODE HERE\n\nprint("DONE")',
testCases: [
{ id: 'tc1', input: 'print(assistant["name"])', expectedOutput: 'brief-weather-bot', description: 'Name matches' },
{ id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Query completes' },
],
hints: [
'assistants.create needs graph_id="agent", name="brief-weather-bot", and config={"configurable": {"system_prompt": "..."}}.',
'In client.runs.stream, use assistant["assistant_id"] — not "agent" — as the assistant_id.',
],
solution: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\nassistant = await client.assistants.create(\n graph_id="agent",\n name="brief-weather-bot",\n config={"configurable": {"system_prompt": "Reply with just the city and temperature, nothing else."}},\n)\n\nthread = await client.threads.create()\nasync for event in client.runs.stream(\n thread_id=thread["thread_id"],\n assistant_id=assistant["assistant_id"],\n input={"messages": [{"role": "user", "content": "Weather in New York?"}]},\n stream_mode="values",\n):\n if event.event == "values":\n msgs = event.data.get("messages", [])\n if msgs:\n print(msgs[-1].get("content", ""))\n\nprint("DONE")',
solutionExplanation: 'Creating an assistant from the same graph gives you different behavior without touching graph code. The assistant_id from the response replaces the default "agent" name in your run calls.',
xpReward: 20,
}
When NOT to Use LangGraph Platform
LangGraph Platform is powerful. But it’s overkill for some situations.
Simple stateless chains. If your workflow is prompt-in, response-out with no memory or tools, a plain FastAPI endpoint is simpler and cheaper.
Sub-10ms latency requirements. The server adds overhead for state management, checkpointing, and the HTTP layer. For latency-critical paths, call the LLM directly.
Tip: > Start withlanggraph devto validate your architecture. If the local server handles your needs, then pick a production option. The self-hosted lite tier is free for up to 1 million node executions.
Existing framework investment. If your team already uses CrewAI or AutoGen, migrating just for deployment isn’t worth it. Use BentoML or a FastAPI wrapper instead.
Vendor lock-in concerns. The cloud SaaS ties you to LangChain’s infrastructure. If that’s a dealbreaker, the self-hosted Docker option runs independently.
Complete Code
Click to expand the full project files (copy-paste and run)
# agent.py
# Complete code from: LangGraph Platform — Deploying Agents as APIs
# Requires: pip install langgraph langchain-openai langchain-core
# Python 3.10+
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
weather_data = {
"London": "Cloudy, 15C",
"Tokyo": "Sunny, 22C",
"New York": "Rainy, 18C",
}
return weather_data.get(city, f"No data for {city}")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([get_weather])
def assistant(state: MessagesState):
"""Call the LLM with tool bindings."""
return {"messages": [llm_with_tools.invoke(state["messages"])]}
graph_builder = StateGraph(MessagesState)
graph_builder.add_node("assistant", assistant)
graph_builder.add_node("tools", ToolNode([get_weather]))
graph_builder.add_edge(START, "assistant")
graph_builder.add_conditional_edges("assistant", tools_condition)
graph_builder.add_edge("tools", "assistant")
graph = graph_builder.compile()
// langgraph.json
{
"dependencies": ["."],
"graphs": {
"agent": "./agent.py:graph"
},
"env": ".env"
}
# client.py — Interact with the deployed agent
# Requires: pip install langgraph-sdk
# Start the server first: langgraph dev
import asyncio
from langgraph_sdk import get_client
async def main():
client = get_client(url="http://127.0.0.1:2024")
thread = await client.threads.create()
print(f"Thread: {thread['thread_id']}")
# First message
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]},
stream_mode="values",
):
if event.event == "values":
msgs = event.data.get("messages", [])
if msgs:
print(f"[{msgs[-1]['type']}]: {msgs[-1].get('content', '')}")
# Follow-up on same thread
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "How about London?"}]},
stream_mode="values",
):
if event.event == "values":
msgs = event.data.get("messages", [])
if msgs:
print(f"[{msgs[-1]['type']}]: {msgs[-1].get('content', '')}")
asyncio.run(main())
Summary
LangGraph Platform turns your notebook agent into a production API. Point langgraph.json at your graph, run langgraph dev, and you get a server with streaming, state persistence, and thread management.
The SDK is your interface. Create threads, send messages, stream responses — same code whether you’re hitting localhost or a cloud deployment. Background runs handle slow tasks. Assistants let you version behavior without redeploying code.
Practice exercise: Build a multi-tool agent — add a get_stock_price tool alongside get_weather. Deploy it with langgraph dev. Create two assistants with different system prompts (one concise, one detailed). Query each with the same question and compare the output.
Click to see the solution outline
1. Add a `get_stock_price` tool with the `@tool` decorator and mock data.
2. Bind both tools: `llm.bind_tools([get_weather, get_stock_price])`.
3. Update `ToolNode` to include both tools.
4. Deploy with `langgraph dev`.
5. Create two assistants via `client.assistants.create()` with different prompts.
6. Create threads for each, send the same question, and compare.
Frequently Asked Questions
Can I use LangGraph Server without LangSmith?
Yes. Self-hosted options (Docker via langgraph up or langgraph build) run on your infrastructure with no LangSmith dependency. You need LangSmith only for cloud SaaS and the Studio web UI.
Does the server support JavaScript agents?
LangGraph Server runs Python agents only. But the SDK client comes in both Python and JavaScript. Your agent runs in Python; your Node.js app talks to it via the JS SDK: import { Client } from "@langchain/langgraph-sdk".
How does pricing work?
Self-hosted lite is free up to 1 million node executions. Cloud SaaS is usage-based through LangSmith plans. BYOC and enterprise require a license. See LangChain’s pricing page for current rates.
Can I run multiple agents on one server?
Yes. Add multiple entries to "graphs" in langgraph.json. Each becomes a separate assistant. Clients address each by name — useful when you have specialized agents sharing infrastructure.
{
"graphs": {
"weather-agent": "./agents/weather.py:graph",
"support-agent": "./agents/support.py:graph"
}
}
References
- LangGraph Platform documentation — Deployment quickstart. Link
- LangGraph Platform GA announcement — LangChain Blog. Link
- LangGraph SDK — Python package on PyPI. Link
- LangGraph Platform API reference. Link
- LangGraph GitHub repository — source code and examples. Link
- Why LangGraph Platform for agent deployment — LangChain Blog. Link
- LangGraph local server documentation. Link
- LangSmith deployment infrastructure. Link
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →