machine learning +
LLM Temperature, Top-P, and Top-K Explained — With Python Simulations
LangGraph Platform: Deploy Agents as APIs
Deploy LangGraph agents as scalable APIs with LangGraph Server — step-by-step guide covering SDK setup, streaming, background runs, and cloud hosting.
LangGraph Platform wraps your agent in a ready-made API server so any app can talk to it — here’s how to go from notebook to live service in minutes.
So you’ve got a working LangGraph agent on your laptop. Great. But a Slack bot, a React frontend, or a cron job can’t just import your Python file and run it. They need an API to call.
Building that API from scratch is more work than you’d think. You need REST routes, a database for chat state, a streaming layer, and a queue for slow tasks. LangGraph Platform bundles all of that into one package. In this guide, I’ll walk you through deploying an agent as an API and calling it from a Python client.
Before we write code, let me lay out how the parts fit together.
Think of your LangGraph graph as the engine of a car. LangGraph Server is the chassis — it takes that engine and puts it behind a set of HTTP endpoints. Clients send requests; the server loads the correct graph, runs it, stores state in a database, and pushes the output back. An SDK client library lets you skip raw HTTP and work with clean Python methods instead. The term “LangGraph Platform” covers everything: the server, the CLI, the visual Studio debugger, and cloud hosting options.
One detail worth highlighting: every single run gets saved as a checkpoint. If your server restarts, nothing is lost. The agent picks up from its last saved point.
What Is LangGraph Platform?
In short, it’s the “go live” layer for LangGraph agents. Hand it a graph, and it turns that graph into a running service — no need to write your own FastAPI routes, stand up a database, or rig streaming from scratch.
Four pieces make up the platform:
- LangGraph Server — provides 30+ REST endpoints covering threads, runs, streaming, assistants, and cron jobs.
- LangGraph SDK — gives you Python and JavaScript clients that wrap those endpoints.
- LangGraph CLI — lets you build, test, and launch servers from the terminal.
- LangGraph Studio — a visual workspace where you test and debug graphs in real time.
Why go through all this? Because agents aren’t typical web services. They hold state across many requests. They stream tokens one at a time. Some tasks run for minutes, not seconds. Wiring each of those features by hand is a project in itself.
Key Insight: > LangGraph Platform goes far beyond simple hosting. It tackles the hard stuff — saving state across requests, running tasks that take minutes in the background, pushing tokens to clients one by one, and scaling across machines — so you don’t have to.
Prerequisites
- Python version: 3.10+
- Required libraries: langgraph (0.4+), langgraph-sdk (0.1.51+), langchain-openai (0.3+), langchain-core (0.3+)
- Install:
pip install langgraph langgraph-cli langgraph-sdk langchain-openai langchain-core - API key: An OpenAI API key set as
OPENAI_API_KEY. See OpenAI’s docs to create one. - Docker: Required for
langgraph up. Install from docker.com. - Time to complete: ~40 minutes
- Prior knowledge: Basic LangGraph concepts (nodes, edges, state) from earlier posts in this series.
How Do You Structure a Project for LangGraph Server?
If you’ve ever tried shipping a Python script and then realized it needs a proper folder layout, this will feel familiar. The server has to know two things: where your graph object lives, and what packages it depends on.
At a minimum, you need three files:
python
my-agent/
├── agent.py # Your graph definition
├── langgraph.json # Server configuration
└── requirements.txt # Python dependencies
Let me walk through each one. The config file langgraph.json is the entry point the server reads when it boots. Here’s the shortest version that works:
json
{
"dependencies": ["."],
"graphs": {
"agent": "./agent.py:graph"
},
"env": ".env"
}
What do these fields mean? "graphs" is a name-to-path map. The key "agent" is the public name clients will use. The value "./agent.py:graph" tells the server to look for a variable called graph inside agent.py. "dependencies" says “install from the current directory.” And "env" points at a file holding secrets like API keys.
Next up: the agent itself. This one uses an LLM that can call a weather tool. It follows the ReAct loop — at each step the LLM either calls a tool or writes a final answer.
python
# agent.py
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# Simplified for demo — real apps call a weather API
weather_data = {
"London": "Cloudy, 15C",
"Tokyo": "Sunny, 22C",
"New York": "Rainy, 18C",
}
return weather_data.get(city, f"No data for {city}")
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
llm_with_tools = llm.bind_tools([get_weather])
def assistant(state: MessagesState):
"""Call the LLM with tool bindings."""
return {"messages": [llm_with_tools.invoke(state["messages"])]}
graph_builder = StateGraph(MessagesState)
graph_builder.add_node("assistant", assistant)
graph_builder.add_node("tools", ToolNode([get_weather]))
graph_builder.add_edge(START, "assistant")
graph_builder.add_conditional_edges("assistant", tools_condition)
graph_builder.add_edge("tools", "assistant")
graph = graph_builder.compile()
Here’s the flow: the assistant node asks the LLM what to do. If the LLM picks the weather tool, tools_condition steers the graph to the tools node. Once the tool finishes, control loops back to assistant. When the LLM answers without calling any tool, the graph wraps up.
Finally, the dependency file is just three lines:
python
langgraph>=0.4
langchain-openai>=0.3
langchain-core>=0.3
Tip: > Lock exact versions before you ship. Loose pins like>=work fine while learning, but in a live app you wantlanggraph==0.4.3to avoid surprise breakage from upstream changes.
How Do You Start the Server on Your Machine?
You now have three files. Let’s fire things up. The CLI ships two commands: langgraph dev for fast, in-memory testing, and langgraph up for a Docker-backed setup that behaves like a real deployment.
My advice: start with langgraph dev. No Docker needed, no database to spin up. It launches a lightweight server you can hit right away.
bash
langgraph dev
You’ll see output like this:
python
Ready!
- API: http://127.0.0.1:2024
- Docs: http://127.0.0.1:2024/docs
- LangGraph Studio Web UI: https://smith.langchain.com/studio/?baseUrl=http://127.0.0.1:2024
Just like that, your agent has an API on port 2024. The server parsed langgraph.json, found the graph, stood up every endpoint, and is now listening. Visit http://127.0.0.1:2024/docs to browse the full API reference.
If you want data that sticks around after a restart, use langgraph up instead:
bash
langgraph up
This one builds a Docker image and brings up a PostgreSQL container alongside it. The first build takes a bit, but after that your chats survive server restarts — exactly what you’d want in a staging or QA setting.
Warning: >langgraph devstores everything in memory. The moment you stop the process, every thread, every checkpoint, and every chat log disappears. Keep it for coding and testing only.Note: > Whatlanggraph upactually does behind the curtain: It creates a Docker Compose stack with two services — your agent and a PostgreSQL instance. The server writes checkpoints to PostgreSQL, giving you the same data model you’d get in the cloud.
How Do You Call the Server Using the LangGraph SDK?
Your server is up. Now you need a way to send it messages. Sure, you could craft raw HTTP calls — every endpoint is a standard REST route. But the SDK handles auth headers, JSON encoding, and streaming plumbing in one neat package.
Here’s how you connect. The get_client call takes a URL and hands back a client object you’ll use for everything:
python
from langgraph_sdk import get_client
client = get_client(url="http://127.0.0.1:2024")
# Check which graphs are available
assistants = await client.assistants.search()
print(assistants)
You should see something like:
python
[{'assistant_id': 'agent', 'graph_id': 'agent', ...}]
Every entry in the "graphs" section of langgraph.json registers as an “assistant” on the server. The names line up one-to-one.
How Do Threads and Messages Work?
A “thread” is simply a container for one conversation. It stores the full message history and all related state. You open a thread, then push runs (that’s the API’s word for messages) into it.
When you call client.runs.stream, the SDK sends your message to the agent and yields events as the graph runs. Each event carries one piece of the reply:
python
# Create a new conversation thread
thread = await client.threads.create()
print(f"Thread ID: {thread['thread_id']}")
# Send a message and stream the response
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}]},
):
if event.event == "values":
messages = event.data.get("messages", [])
if messages:
last = messages[-1]
print(f"[{last['type']}]: {last.get('content', '')}")
The stream looks like this:
python
[ai]:
[tool]: Sunny, 22C
[ai]: The weather in Tokyo is currently sunny with a temperature of 22C.
Three events paint the full picture. First, the LLM chose to invoke the weather tool (you see an AI message with no text but tool-call data attached). Second, the tool ran and returned “Sunny, 22C.” Third, the LLM took that result and wrote a polished reply for the user.
Does the Server Remember Past Messages?
Yes — and you don’t have to lift a finger. Send a follow-up to the same thread without any history:
python
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "How about London?"}]},
):
if event.event == "values":
messages = event.data.get("messages", [])
if messages:
last = messages[-1]
print(f"[{last['type']}]: {last.get('content', '')}")
Output:
python
[ai]:
[tool]: Cloudy, 15C
[ai]: The weather in London is currently cloudy with a temperature of 15C.
The agent knew “How about London?” was about weather because the server pulled the thread’s full history before running the graph. You sent a single new message. The server filled in the rest. That’s the same pattern used by every real-world chatbot.
Key Insight: > Your client never manages state. It sends a thread ID and a message. The server fetches the history, executes the graph with all the context, persists the new state, and pushes back the result. The client stays thin and stateless.
What Endpoints Come Built-In?
You don’t get a single /invoke route. The server ships with over 30 endpoints organized around five core resources:
| Resource | What It Manages | Key Endpoints |
|---|---|---|
| Assistants | Graph configurations | POST /assistants, GET /assistants/search |
| Threads | Conversation state | POST /threads, GET /threads/{id}/state |
| Runs | Graph executions | POST /runs, POST /runs/stream, POST /runs/wait |
| Cron Jobs | Scheduled runs | POST /threads/{id}/runs/crons |
| Store | Long-term memory | PUT /store/items, POST /store/items/search |
Let me zoom in on the three ways to kick off a run:
POST /runs— blocks until the graph is done. Good for fast queries that finish in seconds.POST /runs/stream— feeds you events while the graph works. Ideal for chat interfaces.POST /runs/wait— launches the graph in the background and lets you poll. Best when tasks need minutes.
Curious how this looks at the HTTP level? Here is the same chat done with curl:
bash
curl -X POST http://127.0.0.1:2024/threads \
-H "Content-Type: application/json" \
-d '{}'
Response:
json
{"thread_id": "abc123-...", "created_at": "...", "metadata": {}}
Then trigger a run on that thread:
bash
curl -X POST http://127.0.0.1:2024/threads/abc123/runs/wait \
-H "Content-Type: application/json" \
-d '{
"assistant_id": "agent",
"input": {"messages": [{"role": "user", "content": "Weather in New York?"}]}
}'
The JSON that comes back has the whole graph output — tool calls, tool results, and the final answer. The SDK wraps these exact HTTP calls in friendlier methods.
How Do Background Runs Help with Slow Tasks?
Picture an agent that needs several minutes to dig through sources and draft a report. Holding an HTTP connection open that long is asking for timeouts.
LangGraph Server offers background runs for exactly this. You fire off the task, receive a run ID on the spot, and come back later to grab the result. The server feeds your graph into a task queue and handles it behind the scenes.
client.runs.create starts a background run and returns right away:
python
run = await client.runs.create(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "What's the weather in all three cities?"}]},
)
print(f"Run ID: {run['run_id']}")
print(f"Status: {run['status']}")
Instant response:
python
Run ID: run-456...
Status: pending
Later, check progress:
python
run_status = await client.runs.get(
thread_id=thread["thread_id"],
run_id=run["run_id"],
)
print(f"Status: {run_status['status']}")
python
Status: success
Once the status flips to success, read the thread state to get the answer. This is the right pattern for any workload where tying up a web server thread while the agent thinks is not an option.
Tip: > Skip manual polling — useclient.runs.join. Callingawait client.runs.join(thread_id, run_id)waits until the run wraps up. It polls under the hood so you don’t need retry loops of your own.
What Are the Ways to Go Live — Local, Cloud, and Self-Hosted?
We’ve been running on localhost. When it’s time to serve real users, you have four paths. Each trades simplicity for control:
| Option | Where It Runs | Best For | State Storage | Cost |
|---|---|---|---|---|
langgraph dev | Your laptop | Coding and testing | RAM only | Free |
| Cloud SaaS | LangSmith servers | Quick launch, small teams | Managed PostgreSQL | Pay as you go |
| BYOC | Your AWS/GCP VPC | Strict data rules | Your own database | License fee |
| Self-Hosted | Your own machines | Total control | Your own database | License fee |
Cloud Path
Cloud SaaS is the shortest route to a live URL. Your code sits in a GitHub repo. The platform builds and ships it for you.
Four steps and you’re done:
- Push your project (including
langgraph.json) to GitHub. - Connect the repo inside the LangSmith dashboard.
- Fill in your API keys under deployment settings.
- Hit deploy.
The platform creates a Docker image, sets up PostgreSQL, and hands you a URL. Your client code changes in exactly two spots:
python
cloud_client = get_client(
url="https://your-deployment-id.us.langgraph.app",
api_key="your-langsmith-api-key",
)
# Everything else is identical
thread = await cloud_client.threads.create()
async for event in cloud_client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "Weather in Tokyo?"}]},
):
print(event.data)
This is the SDK’s biggest selling point. Same code, swap the URL. Whether you point at localhost or the cloud, the client works the same way.
Self-Hosted with Docker
Want to run everything on your own servers? Build a container image from your project:
bash
langgraph build -t my-agent-server
That image is fully portable. Push it to any registry and deploy on Kubernetes, ECS, Cloud Run — anything that speaks Docker.
The server expects a PostgreSQL instance for state. Hand it the connection string through an env var:
bash
docker run -p 8123:8000 \
-e OPENAI_API_KEY="your-key" \
-e DATABASE_URI="postgresql://user:pass@host:5432/langgraph" \
my-agent-server
Warning: > Never bake secrets into your Docker images. Rely on env vars, a secrets manager like AWS Secrets Manager or HashiCorp Vault, or Kubernetes secrets. A key baked into an image layer is a breach waiting to happen.
How Do Assistants Let You Tweak Behavior Without Redeploying?
Sooner or later you’ll face this: you want to A/B test two system prompts. Or your customer team wants a friendly tone while the dev-facing API keeps things brief.
Assistants are the answer. An assistant is a named profile for your graph. Same code underneath, different settings on top.
Create one with client.assistants.create. The config dictionary carries the settings your graph reads at runtime:
python
assistant = await client.assistants.create(
graph_id="agent",
config={
"configurable": {
"system_prompt": "You are a concise weather bot. Temperature in Celsius and Fahrenheit."
}
},
name="weather-expert",
)
print(f"Assistant ID: {assistant['assistant_id']}")
python
Assistant ID: asst-789...
Want to adjust the prompt a week later? Hit the API — zero downtime, zero deploys:
python
updated = await client.assistants.update(
assistant_id=assistant["assistant_id"],
config={
"configurable": {
"system_prompt": "You are a detailed weather bot. Include temperature, humidity, and wind."
}
},
)
This is how prompt work actually plays out in live systems. You push code once and then refine the wording through API calls. I find this far better than redeploying every time a prompt changes — it’s quicker and lower risk.
Tip: > Use assistants to separate use cases. One graph, many profiles: “support-bot” for end users, “dev-bot” for your team, “test-bot” for QA. Each carries its own system prompt, model pick, and tool list — without copying a single line of code.
How Do You Pick the Right Streaming Mode?
Different consumers want different levels of detail. A chatbot needs to show words appearing one at a time. A monitoring dashboard only cares about state diffs. A debugger wants the full picture. The server gives you four modes to choose from:
| Mode | What You Receive | Typical Use |
|---|---|---|
values | Full state snapshot after each node | Debugging and auditing |
messages | Individual LLM tokens as they’re produced | Chat UIs |
updates | Only the fields each node changed | Live dashboards |
events | Low-level LangGraph events | Custom pipeline logic |
For a chat interface, go with messages. It pushes each token the moment the LLM writes it — the live-typing feel you know from ChatGPT:
python
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "Tell me about Tokyo's weather"}]},
stream_mode="messages",
):
if hasattr(event, 'data') and event.data:
if isinstance(event.data, dict) and "content" in event.data:
print(event.data["content"], end="", flush=True)
Every print renders one token. On the frontend, text appears word by word.
When something goes wrong, switch to values mode. It shows the full state after every node:
python
async for event in client.runs.stream(
thread_id=thread["thread_id"],
assistant_id="agent",
input={"messages": [{"role": "user", "content": "Weather in London?"}]},
stream_mode="values",
):
if event.event == "values":
msgs = event.data.get("messages", [])
print(f"--- {len(msgs)} messages in state ---")
for m in msgs:
print(f" [{m['type']}]: {m.get('content', '[tool call]')[:60]}")
Now you watch the state grow at each step — user message, LLM tool request, tool reply, final answer. When a run goes sideways, this view pinpoints exactly where things went off track.
Common Mistakes and How to Fix Them
Mistake 1: Wrong Graph Variable Name in langgraph.json
json
{
"graphs": {
"agent": "./agent.py:app"
}
}
Why it breaks: Your code defines graph = graph_builder.compile(), but the config references :app. The server tries to import an object that doesn’t exist and throws an error.
The fix: Make the name after the colon match your actual variable:
json
{
"graphs": {
"agent": "./agent.py:graph"
}
}
Mistake 2: Running langgraph dev in Production
bash
# This will lose user data on every restart
langgraph dev --host 0.0.0.0
Why it breaks: Everything lives in RAM. The next restart wipes every thread and every conversation. Users lose all their history.
The fix:
bash
langgraph up
Mistake 3: Missing Environment Variables
bash
langgraph dev
# Server starts fine, but first request crashes:
# ERROR: OPENAI_API_KEY not set
Why it fools you: The server boots without complaint even when keys are missing. The crash only shows up when the first real request reaches the LLM. Inside a Docker container, this is even trickier to spot.
The fix: Put a .env file in your project and reference it from langgraph.json:
python
# .env
OPENAI_API_KEY=sk-your-key-here
LANGSMITH_API_KEY=your-langsmith-key
Warning: > Make sure.envis in.gitignore. When the secrets file sits next tolanggraph.json, it’s dangerously easy to commit it. Add the exclusion before your very first commit.
Exercise 1: Deploy and Query Your Agent
Time to wire everything up yourself. Launch the weather agent locally, then hold a two-turn chat using the SDK.
typescript
{
type: 'exercise',
id: 'deploy-query-agent',
title: 'Exercise 1: Deploy and Query Your Agent',
difficulty: 'advanced',
exerciseType: 'write',
instructions: 'Using the LangGraph SDK, create a thread, ask about the weather in Tokyo, then follow up by asking about London on the SAME thread. Print both AI responses. The agent should call the get_weather tool for each city.',
starterCode: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\n# Step 1: Create a thread\nthread = await client.threads.create()\n\n# Step 2: Send first message about Tokyo\n# YOUR CODE HERE — use client.runs.stream\n\n# Step 3: Send follow-up about London on the SAME thread\n# YOUR CODE HERE\n\nprint("DONE")',
testCases: [
{ id: 'tc1', input: 'print(thread["thread_id"][:4])', expectedOutput: 'DONE', hidden: true, description: 'Thread created' },
{ id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Both queries complete' },
],
hints: [
'Use client.runs.stream with thread_id=thread["thread_id"], assistant_id="agent", and input={"messages": [{"role": "user", "content": "..."}]}',
'For the follow-up, use the same thread_id. The server loads conversation history automatically.',
],
solution: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\nthread = await client.threads.create()\n\nasync for event in client.runs.stream(\n thread_id=thread["thread_id"],\n assistant_id="agent",\n input={"messages": [{"role": "user", "content": "What is the weather in Tokyo?"}]},\n stream_mode="values",\n):\n if event.event == "values":\n msgs = event.data.get("messages", [])\n if msgs:\n print(f"Tokyo: {msgs[-1].get(\'content\', \'\')}")\n\nasync for event in client.runs.stream(\n thread_id=thread["thread_id"],\n assistant_id="agent",\n input={"messages": [{"role": "user", "content": "How about London?"}]},\n stream_mode="values",\n):\n if event.event == "values":\n msgs = event.data.get("messages", [])\n if msgs:\n print(f"London: {msgs[-1].get(\'content\', \'\')}")\n\nprint("DONE")',
solutionExplanation: 'Reusing the same thread_id is the key. The server loads full conversation history, so the agent understands "How about London?" refers to weather from the previous message.',
xpReward: 20,
}
Exercise 2: Create a Custom Assistant
Now build your own assistant with a different personality.
typescript
{
type: 'exercise',
id: 'create-custom-assistant',
title: 'Exercise 2: Create a Custom Assistant',
difficulty: 'advanced',
exerciseType: 'write',
instructions: 'Create a new assistant named "brief-weather-bot" from the "agent" graph. Give it a system prompt: "Reply with just the city and temperature, nothing else." Then query it about New York weather.',
starterCode: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\n# Step 1: Create a custom assistant\nassistant = await client.assistants.create(\n # YOUR CODE HERE\n)\n\n# Step 2: Create a thread and query YOUR assistant\nthread = await client.threads.create()\n# YOUR CODE HERE\n\nprint("DONE")',
testCases: [
{ id: 'tc1', input: 'print(assistant["name"])', expectedOutput: 'brief-weather-bot', description: 'Name matches' },
{ id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Query completes' },
],
hints: [
'assistants.create needs graph_id="agent", name="brief-weather-bot", and config={"configurable": {"system_prompt": "..."}}.',
'In client.runs.stream, use assistant["assistant_id"] — not "agent" — as the assistant_id.',
],
solution: 'from langgraph_sdk import get_client\n\nclient = get_client(url="http://127.0.0.1:2024")\n\nassistant = await client.assistants.create(\n graph_id="agent",\n name="brief-weather-bot",\n config={"configurable": {"system_prompt": "Reply with just the city and temperature, nothing else."}},\n)\n\nthread = await client.threads.create()\nasync for event in client.runs.stream(\n thread_id=thread["thread_id"],\n assistant_id=assistant["assistant_id"],\n input={"messages": [{"role": "user", "content": "Weather in New York?"}]},\n stream_mode="values",\n):\n if event.event == "values":\n msgs = event.data.get("messages", [])\n if msgs:\n print(msgs[-1].get("content", ""))\n\nprint("DONE")',
solutionExplanation: 'Creating an assistant from the same graph gives you different behavior without touching graph code. The assistant_id from the response replaces the default "agent" name in your run calls.',
xpReward: 20,
}
When Should You NOT Use LangGraph Platform?
Powerful as it is, the platform is overkill in a few situations.
One-shot chains with no memory. If your workflow is just “prompt in, answer out” with no state and no tools, a bare FastAPI endpoint does the job at lower cost.
Microsecond-sensitive paths. The server adds overhead for state management, checkpoints, and the HTTP layer. When every millisecond counts, call the LLM directly.
Tip: > Uselanggraph devas a quick smoke test. If the local server covers what you need, then choose a hosting path. The free self-hosted tier covers up to 1 million node runs.
Heavy investment in another framework. If your team already runs CrewAI or AutoGen, switching just for the hosting layer isn’t worth the migration cost. Reach for BentoML or a FastAPI wrapper instead.
Avoiding vendor lock-in. Cloud SaaS ties you to LangChain’s servers. If that’s a concern, grab the self-hosted Docker option — it runs fully on your own gear.
Complete Code
Summary
LangGraph Platform takes a graph you built in a notebook and makes it available as a live API. Point langgraph.json at your graph, run langgraph dev, and you instantly get a server with streaming, state storage, and thread management built in.
The SDK keeps things simple on the client side. Create threads, send messages, stream answers — and the exact same code works whether you’re hitting localhost or a cloud URL. For slow workloads, background runs offload the wait. For prompt experiments, assistants let you change behavior on the fly without pushing new code.
Practice exercise: Extend the weather agent with a get_stock_price tool. Deploy it using langgraph dev. Spin up two assistants — one that gives short answers and one that gives detailed answers. Ask both the same question and compare what each returns.
Frequently Asked Questions
Can I use LangGraph Server without LangSmith?
Yes. The self-hosted options — Docker through langgraph up or a custom image from langgraph build — run entirely on your own machines with no LangSmith tie. You only need LangSmith for cloud SaaS hosting and the Studio web UI.
Does the server support JavaScript agents?
The server itself runs Python agents only. However, the SDK client ships in both Python and JavaScript flavors. So your agent logic stays in Python while your Node.js frontend talks to it through the JS SDK: import { Client } from "@langchain/langgraph-sdk".
How does pricing work?
The free self-hosted tier covers up to 1 million node runs. Cloud SaaS follows a pay-as-you-go model through LangSmith plans. BYOC and enterprise tiers need a license. See LangChain’s pricing page for current numbers.
Can I run multiple agents on one server?
Absolutely. Add more entries under "graphs" in langgraph.json. Each one becomes its own named assistant. Clients address each by name — a clean way to host several focused agents on shared hardware.
json
{
"graphs": {
"weather-agent": "./agents/weather.py:graph",
"support-agent": "./agents/support.py:graph"
}
}
References
- LangGraph Platform documentation — Deployment quickstart. Link
- LangGraph Platform GA announcement — LangChain Blog. Link
- LangGraph SDK — Python package on PyPI. Link
- LangGraph Platform API reference. Link
- LangGraph GitHub repository — source code and examples. Link
- Why LangGraph Platform for agent deployment — LangChain Blog. Link
- LangGraph local server documentation. Link
- LangSmith deployment infrastructure. Link
Free Course
Master Core Python — Your First Step into AI/ML
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Up Next in Learning Path
LangGraph Debugging with LangSmith Tracing Guide
