LangGraph Memory Systems — Short-Term, Long-Term, and Conversation Memory Explained
Your chatbot nails a tough question. The user asks “can you say that more simply?” — and the bot has no idea what “that” means. Without memory, each call starts blank. The chat feels broken, and the user gives up.
Memory is the fix. It turns a raw LLM call into a smooth, aware agent. LangGraph ships three kinds of memory for this. Short-term memory keeps the current chat intact. Summary memory squeezes old messages so you don’t blow your token budget. Long-term memory saves facts across sessions so your agent knows returning users.
Before we build, let’s see how the pieces connect.
Every message in a chat lands in state — that’s short-term memory. But context windows have a ceiling. When the chat grows long, you have two options: trim old messages with a sliding window, or crush them into a summary. Both keep the LLM under its token limit.
Long-term memory works in a totally different way. It lives outside the chat thread, in a thing called a Store. Your agent writes facts there — “this user likes Python over JavaScript” — and reads them back in later sessions. The Store lives across threads, so a user who shows up next week gets a custom experience.
We’ll build each type step by step, then blend them into one agent.
What Is Memory in LangGraph?
In LangGraph, “memory” simply means data your agent can read from earlier turns. There’s no magic layer — it’s just info that outlasts a single invoke() call.
The framework draws a line between two scopes:
-
Short-term memory — locked to one thread (one chat). It’s the running list of messages. A checkpointer writes it to storage after each graph run.
-
Long-term memory — crosses thread lines. It sits in a Store and persists across totally separate chats. Great for user prefs, learned facts, or skills the agent picks up over time.
Below is every import we’ll touch in this guide — the LLM wrapper, message types, the trim_messages helper for capping context, and LangGraph’s graph-building and memory tools.
import os
import re
from typing import Annotated, TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.messages import (
HumanMessage,
AIMessage,
SystemMessage,
RemoveMessage,
)
from langchain_core.messages.utils import (
trim_messages,
count_tokens_approximately,
)
from langchain_core.runnables import RunnableConfig
from langgraph.graph import (
StateGraph,
MessagesState,
START,
END,
add_messages,
)
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore
from langgraph.store.base import BaseStore
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
Before You Start
-
Python: 3.10+
-
Packages: langgraph (0.4+), langchain-openai (0.3+), langchain-core (0.3+)
-
Install:
pip install langgraph langchain-openai langchain-core -
API key: Set
OPENAI_API_KEYin your shell. See OpenAI’s docs to make one. -
Time: ~30 minutes
-
Background: Basic LangGraph state and checkpointing ideas.
KEY INSIGHT: Short-term memory handles “what did we just say?” Long-term memory handles “what do I already know about this person?” Two different jobs, two different LangGraph APIs.
How Does Short-Term Memory Work?
Why does this matter? Without it, your bot can’t even handle “yes” as a follow-up. Each turn is a clean slate.
This is the simplest memory type. If your graph uses MessagesState and you attach a checkpointer, you already have it. The checkpointer writes the full message list to storage after each run. Next time the same thread fires the graph, those messages reload on their own.
Here’s a minimal chatbot with short-term memory. InMemorySaver holds state in RAM. The thread_id in the config tells LangGraph which chat to open.
def chatbot(state: MessagesState):
response = model.invoke(state["messages"])
return {"messages": [response]}
graph = StateGraph(MessagesState)
graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", END)
checkpointer = InMemorySaver()
app = graph.compile(checkpointer=checkpointer)
When two calls use the same thread_id, they share a single message timeline. Let’s test: send a fact, then ask a question that only works if the bot kept that fact.
config = {"configurable": {"thread_id": "user-123"}}
response1 = app.invoke(
{"messages": [HumanMessage(content="My name is Alex.")]},
config=config,
)
print(response1["messages"][-1].content)
The model says hello to Alex. Now for the real test:
response2 = app.invoke(
{"messages": [HumanMessage(content="What's my name?")]},
config=config,
)
print(response2["messages"][-1].content)
It answers “Alex” — because the checkpointer restored all earlier messages before this run kicked off. Without that one-line setup, the bot would know nothing.
That’s short-term memory in a nutshell. Simple, but there’s a catch lurking underneath.
Why Can’t You Keep Every Message?
Every saved message gets sent to the LLM on the next call. After 50 rounds of back and forth, your message list could hold 100+ entries. That’s thousands of tokens — and you’re paying for every one.
Even models with 128K context windows don’t make this okay. Most old turns are noise by now. The user asked about Python generators ages ago — is that still useful when they’re asking about memory today?
You need a strategy. LangGraph gives you two: sliding windows (chop old messages) and summary memory (compress them).
How Does Sliding Window Memory Work?
This is the easiest approach. Before calling the LLM, you chop off the oldest messages and keep only the freshest ones. Those old messages still live in the checkpoint — you just don’t feed them to the model.
LangChain’s trim_messages handles the work. Give it the full message list and a token budget, and it hands back a trimmed copy. strategy="last" keeps the newest turns. start_on="human" makes sure the trimmed output always opens with a user message — models act weird when a chat starts with a half-finished AI reply.
def chatbot_windowed(state: MessagesState):
trimmed = trim_messages(
state["messages"],
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=500,
start_on="human",
)
response = model.invoke(trimmed)
return {"messages": [response]}
The rest of the graph stays the same — only the node function changes.
graph_w = StateGraph(MessagesState)
graph_w.add_node("chatbot", chatbot_windowed)
graph_w.add_edge(START, "chatbot")
graph_w.add_edge("chatbot", END)
app_windowed = graph_w.compile(checkpointer=InMemorySaver())
What does this look like in action? The bot recalls fresh messages but forgets anything that aged out of the token window.
config_w = {"configurable": {"thread_id": "window-demo"}}
app_windowed.invoke(
{"messages": [HumanMessage(content="My name is Alex.")]},
config=config_w,
)
app_windowed.invoke(
{"messages": [HumanMessage(content="I work at a startup.")]},
config=config_w,
)
app_windowed.invoke(
{"messages": [HumanMessage(content="We use Python and FastAPI.")]},
config=config_w,
)
After a few more turns, the oldest messages vanish from the LLM’s view. The checkpoint keeps everything — the model just sees what fits in 500 tokens.
TIP: Pick your max_tokens based on the job, not the model’s limit. A support bot might need 2,000 tokens of history. A coding helper might need 8,000 to keep track of the function you’re building. Start low and raise it if the bot forgets too much.
Quick Check: What if max_tokens=100 and the user sends a single 150-token message? trim_messages keeps the latest turn even when it blows the budget. It never drops the most recent message.
typescriptCopy
{
type: 'exercise',
id: 'sliding-window-ex',
title: 'Exercise 1: Build a Windowed Chatbot',
difficulty: 'intermediate',
exerciseType: 'write',
instructions: 'Create a chatbot node that trims messages to the last 300 tokens using `trim_messages`. Use `count_tokens_approximately` as the token counter and set `start_on="human"`. The function should take a `MessagesState` and return a dict with the model response.',
starterCode: 'def windowed_chatbot(state: MessagesState):\n # Trim messages to last 300 tokens\n trimmed = trim_messages(\n state["messages"],\n # Add the missing parameters here\n )\n response = model.invoke(trimmed)\n return {"messages": [response]}',
testCases: [
{ id: 'tc1', input: 'print(windowed_chatbot.__name__)', expectedOutput: 'windowed_chatbot', description: 'Function should be named windowed_chatbot' },
{ id: 'tc2', input: 'import inspect; sig = inspect.signature(windowed_chatbot); print("state" in str(sig))', expectedOutput: 'True', description: 'Function takes state parameter' },
],
hints: [
'You need three keyword arguments: strategy="last", token_counter=count_tokens_approximately, max_tokens=300',
'Add start_on="human" to ensure trimmed messages start with a human message: trim_messages(state["messages"], strategy="last", token_counter=count_tokens_approximately, max_tokens=300, start_on="human")',
],
solution: 'def windowed_chatbot(state: MessagesState):\n trimmed = trim_messages(\n state["messages"],\n strategy="last",\n token_counter=count_tokens_approximately,\n max_tokens=300,\n start_on="human",\n )\n response = model.invoke(trimmed)\n return {"messages": [response]}',
solutionExplanation: 'The trim_messages function takes the full message list and returns only the most recent messages that fit within 300 tokens. Setting start_on="human" ensures the trimmed result starts with a human message, which prevents confusing the LLM with an orphaned AI response.',
xpReward: 15,
}
How Does Summary Memory Work?
Windows are simple but lossy. Once a message drops off, that info is gone forever. Summary memory splits the difference: before throwing out old turns, you ask the LLM to boil them down. A short recap takes their place — key facts kept, token count slashed.
You’ll need a custom state that holds both the message list and a running summary string.
class SummaryState(TypedDict):
messages: Annotated[list, add_messages]
summary: str
The chatbot node checks for a saved summary. If one exists, it pastes it at the top as a system message and trims to just the latest turns. The LLM then sees two things: a compressed recap of older turns, plus the freshest messages in full.
def chatbot_with_summary(state: SummaryState):
messages = state["messages"]
summary = state.get("summary", "")
if summary:
system_msg = SystemMessage(
content=f"Conversation summary: {summary}"
)
recent = trim_messages(
messages,
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=300,
start_on="human",
)
full_context = [system_msg] + recent
else:
full_context = messages
response = model.invoke(full_context)
return {"messages": [response]}
So who does the summarizing? A separate node. It fires when the message count goes over a threshold. This node asks the LLM to blend the new turns into the running recap, then uses RemoveMessage to strip old entries from state. We hold on to the last 2 messages so the flow feels natural.
def should_summarize(state: SummaryState):
"""Route to summarizer when messages pile up."""
if len(state["messages"]) > 10:
return "summarize"
return END
def summarize_conversation(state: SummaryState):
messages = state["messages"]
existing = state.get("summary", "")
if existing:
prompt = (
f"Current summary:\n{existing}\n\n"
"Extend this summary with the new messages. "
"Capture key facts, decisions, preferences."
)
else:
prompt = (
"Summarize the conversation so far. "
"Capture key facts, decisions, preferences."
)
summary_msgs = messages + [HumanMessage(content=prompt)]
new_summary = model.invoke(summary_msgs)
# Keep last 2 messages, remove the rest
delete = [
RemoveMessage(id=m.id) for m in messages[:-2]
]
return {
"summary": new_summary.content,
"messages": delete,
}
WARNING: RemoveMessage doesn’t erase anything from the checkpoint log. It just drops messages from the live state so the LLM doesn’t see them. The full history stays in the checkpoint — you can still replay or inspect every past turn.
Now wire the pieces: chatbot runs first, then a branch decides if the summary node should fire.
graph_s = StateGraph(SummaryState)
graph_s.add_node("chatbot", chatbot_with_summary)
graph_s.add_node("summarize", summarize_conversation)
graph_s.add_edge(START, "chatbot")
graph_s.add_conditional_edges(
"chatbot",
should_summarize,
{"summarize": "summarize", END: END},
)
graph_s.add_edge("summarize", END)
app_summary = graph_s.compile(checkpointer=InMemorySaver())
Once the count crosses 10, the summary node kicks in by itself. Old turns vanish, replaced by a tight recap. The LLM gets the recap plus the 2 newest messages — full awareness without the bloated token bill.
KEY INSIGHT: Summary memory swaps compute for context. You burn one extra LLM call to shrink old turns, but you save tokens on every future call. For chats that run past 20 rounds, the trade-off is almost always worth it.
typescriptCopy
{
type: 'exercise',
id: 'summary-memory-ex',
title: 'Exercise 2: Write the Summarization Trigger',
difficulty: 'intermediate',
exerciseType: 'write',
instructions: 'Write a `should_summarize` function that checks if the message count exceeds a given threshold. If `len(state["messages"])` is greater than 6, return `"summarize"`. Otherwise, return `END`. The function takes a `SummaryState` dict.',
starterCode: 'from langgraph.graph import END\n\ndef should_summarize(state: SummaryState):\n # Check message count and return route\n pass',
testCases: [
{ id: 'tc1', input: 'print(should_summarize({"messages": [1,2,3,4,5,6,7], "summary": ""}))', expectedOutput: 'summarize', description: '7 messages should trigger summarize' },
{ id: 'tc2', input: 'print(should_summarize({"messages": [1,2,3], "summary": ""}))', expectedOutput: '__end__', description: '3 messages should not trigger summarize' },
],
hints: [
'Compare len(state["messages"]) to 6 using an if statement',
'Full solution: if len(state["messages"]) > 6: return "summarize" else: return END',
],
solution: 'def should_summarize(state: SummaryState):\n if len(state["messages"]) > 6:\n return "summarize"\n return END',
solutionExplanation: 'The function checks the message count against the threshold. When messages exceed the limit, it routes to the summarize node. Otherwise, the graph ends normally. You can adjust the threshold based on your token budget.',
xpReward: 15,
}
How Does Long-Term Memory Work with the Store?
Short-term memory is locked to one thread. Start a new thread and the agent has a blank slate. Fine for quick chats. But when a user returns next week and says “use the same settings as last time,” the agent draws a total blank.
That’s where long-term memory comes in. It relies on the Store — a key-value database that lives outside the graph’s state. You put facts in and get them out whenever you want. The Store sorts data by namespaces (think: folders) and keys (think: file names). Because it sits outside any single thread, the data survives across sessions.
InMemoryStore is the dev-friendly version. In production you’d plug in PostgresStore or a MongoDB backend. The code looks the same either way.
Here’s the basic pattern. Save items as dicts under a namespace tuple and a string key.
store = InMemoryStore()
store.put(
namespace=("users", "alex"),
key="preference",
value={"language": "Python", "framework": "FastAPI"},
)
item = store.get(
namespace=("users", "alex"), key="preference"
)
print(item.value)
{'language': 'Python', 'framework': 'FastAPI'}
Need all the facts for one user? Search the namespace.
memories = store.search(namespace=("users", "alex"))
for memory in memories:
print(f"{memory.key}: {memory.value}")
preference: {'language': 'Python', 'framework': 'FastAPI'}
NOTE: InMemoryStore dies with your Python process. For real apps, swap in PostgresStore (from langgraph-checkpoint-postgres) or a Redis/MongoDB backend. Your code won’t change — only the class name does.
The real magic shows when you hook the Store into a graph. Pass store=... at compile time and LangGraph injects it into your node functions for you. Nodes access it through a keyword-only store argument.
Below is a chatbot that looks up user prefs in the Store before it responds. The user_id comes from the config — in a real app, your auth layer would set it.
def chatbot_with_memory(
state: MessagesState,
config: RunnableConfig,
*,
store: BaseStore,
):
user_id = config["configurable"]["user_id"]
memories = store.search(namespace=("users", user_id))
memory_text = "\n".join(
f"- {m.key}: {m.value}" for m in memories
)
system_msg = SystemMessage(
content=(
"You are a helpful assistant. "
"User info:\n" + memory_text + "\n"
"If the user shares new preferences, "
"note them as [MEMORY: key=value]."
)
)
response = model.invoke([system_msg] + state["messages"])
return {"messages": [response]}
Who writes those facts to disk? A second node. It scans the agent’s output for [MEMORY: key=value] tags and pushes each match into the Store.
def save_memories(
state: MessagesState,
config: RunnableConfig,
*,
store: BaseStore,
):
user_id = config["configurable"]["user_id"]
last_message = state["messages"][-1].content
pattern = r"\[MEMORY:\s*(\w+)=(.+?)\]"
matches = re.findall(pattern, last_message)
for key, value in matches:
store.put(
namespace=("users", user_id),
key=key,
value={"fact": value.strip()},
)
return {"messages": []}
When you compile, pass both the checkpointer (for short-term chat history) and the store (for long-term facts).
graph_lt = StateGraph(MessagesState)
graph_lt.add_node("chatbot", chatbot_with_memory)
graph_lt.add_node("save_memories", save_memories)
graph_lt.add_edge(START, "chatbot")
graph_lt.add_edge("chatbot", "save_memories")
graph_lt.add_edge("save_memories", END)
app_longterm = graph_lt.compile(
checkpointer=InMemorySaver(),
store=store,
)
Let’s prove it works across sessions. In session 1 we’ll teach the agent a fact. In session 2 (fresh thread, same user) we’ll ask for it back.
config1 = {
"configurable": {
"thread_id": "session-1",
"user_id": "alex",
}
}
r1 = app_longterm.invoke(
{"messages": [HumanMessage(
content="I prefer dark mode and Python 3.12."
)]},
config=config1,
)
print(r1["messages"][-1].content)
The agent sees the prefs and marks them with [MEMORY: ...] tags in its reply. The save_memories node picks up those tags and pushes them into the Store.
config2 = {
"configurable": {
"thread_id": "session-2",
"user_id": "alex",
}
}
r2 = app_longterm.invoke(
{"messages": [HumanMessage(
content="What do you remember about me?"
)]},
config=config2,
)
print(r2["messages"][-1].content)
The agent recalls dark mode and Python 3.12 — on a thread it has never seen before. That’s cross-session memory doing its job.
TIP: Use clean, structured keys — not free-form blobs. Names like language_preference, framework, timezone make lookups solid and debugging painless.
typescriptCopy
{
type: 'exercise',
id: 'long-term-memory-ex',
title: 'Exercise 3: Store and Retrieve a User Preference',
difficulty: 'intermediate',
exerciseType: 'write',
instructions: 'Using an `InMemoryStore`, store a memory with namespace `("users", "bob")`, key `"editor"`, and value `{"tool": "VS Code"}`. Then retrieve it using `store.get()` and print the value dict.',
starterCode: 'from langgraph.store.memory import InMemoryStore\n\nstore = InMemoryStore()\n\n# Store the memory\n# YOUR CODE HERE\n\n# Retrieve and print it\n# YOUR CODE HERE',
testCases: [
{ id: 'tc1', input: 'item = store.get(namespace=("users", "bob"), key="editor"); print(item.value)', expectedOutput: "{'tool': 'VS Code'}", description: 'Should retrieve the stored preference' },
{ id: 'tc2', input: 'items = store.search(namespace=("users", "bob")); print(len(list(items)))', expectedOutput: '1', description: 'Should have exactly one memory' },
],
hints: [
'Use store.put(namespace=(...), key="...", value={...}) to store the memory',
'Full store call: store.put(namespace=("users", "bob"), key="editor", value={"tool": "VS Code"}). Then: item = store.get(namespace=("users", "bob"), key="editor") and print(item.value)',
],
solution: 'from langgraph.store.memory import InMemoryStore\n\nstore = InMemoryStore()\nstore.put(namespace=("users", "bob"), key="editor", value={"tool": "VS Code"})\n\nitem = store.get(namespace=("users", "bob"), key="editor")\nprint(item.value)',
solutionExplanation: 'The put method stores a dict under a namespace and key. The get method retrieves it by the same namespace and key. Namespaces are tuples of strings that act like folder paths, scoping memories to specific users or contexts.',
xpReward: 15,
}
How Do the Memory Types Compare?
| Feature | Short-Term (Checkpointer) | Sliding Window | Summary Memory | Long-Term (Store) |
|---|---|---|---|---|
| Scope | Single thread | Single thread | Single thread | Cross-thread |
| Persistence | Until thread ends | Until thread ends | Until thread ends | Stays forever |
| Token cost | Grows with chat | Fixed cap | Fixed cap | Tiny (on demand) |
| Info loss | None | Old messages gone | Compressed, some detail lost | None |
| Best for | Short chats | Support, Q&A | Long brainstorms | User profiles, prefs |
| Setup | One line (checkpointer) | Add trim_messages |
Custom state + summarize node | Store + read/write nodes |
KEY INSIGHT: Most shipped agents layer at least two types. Short-term for the active thread, long-term for user facts across visits. Summary memory is worth adding only when chats regularly run past 20 turns.
How Do You Combine Memory Types in Production?
In practice, you don’t choose just one — you stack them. Here’s the recipe I’d use for most apps: the Store for lasting user facts, summary memory to compress aging turns, and a sliding window as the final safety net before the LLM call.
The state carries both messages and a summary. The node loads long-term facts from the Store, drops them in as context, tacks on the running recap, and trims to the freshest turns.
def production_chatbot(
state: SummaryState,
config: RunnableConfig,
*,
store: BaseStore,
):
user_id = config["configurable"]["user_id"]
summary = state.get("summary", "")
# 1. Load long-term memories
memories = store.search(namespace=("users", user_id))
mem_text = "\n".join(
f"- {m.key}: {m.value}" for m in memories
)
# 2. Build system context
parts = ["You are a helpful assistant."]
if mem_text:
parts.append(f"User profile:\n{mem_text}")
if summary:
parts.append(f"Conversation so far:\n{summary}")
system_msg = SystemMessage(content="\n\n".join(parts))
# 3. Trim recent messages
recent = trim_messages(
state["messages"],
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=1000,
start_on="human",
)
response = model.invoke([system_msg] + recent)
return {"messages": [response]}
One function, three data sources: the Store (lasting facts), the summary string (compressed older turns), and the trimmed list (live messages). The LLM gets the whole picture without burning extra tokens.
UNDER-THE-HOOD: The order matters here. System message comes first, then the recent turns. The LLM reads the system message as background context and the recent turns as the active chat. Placing the summary in the system prompt (rather than as a user turn) stops the model from trying to “answer” the summary itself.
When Should You Use Each Memory Type?
The right choice depends on what your app actually does — not on how clever the setup looks.
Short-term only works when chats are brief. A quick Q&A bot that wraps up in 3–5 turns needs nothing else. Plug in a checkpointer and move on.
Add a sliding window when chats could stretch but early turns rarely matter. Support bots fit this well. The answer usually hangs on the last few messages, not the hello from 20 minutes ago.
Add summary memory when old context still matters but you can’t afford to keep every message. Think project-planning sessions where a choice from an hour ago still shapes the outcome — but the exact phrasing doesn’t.
Add long-term memory when users return across sessions. Personal helpers, tutoring apps, and enterprise copilots live here. If a user says “use the same format as last time” and your agent is lost, you need a Store.
WARNING: Don’t add memory layers you don’t need. Each one means more state, more nodes, and more things that can break. Pick the simplest setup that works. Stack more only when your users run into limits.
What Are the Most Common Memory Mistakes?
Mistake 1: No checkpointer
Wrong:
app = graph.compile() # No checkpointer!
What goes wrong: State doesn’t survive between calls. Every invoke() begins with an empty message list. The bot has zero recall.
Fix:
app = graph.compile(checkpointer=InMemorySaver())
Mistake 2: Feeding the full history to the LLM
Wrong:
def chatbot(state: MessagesState):
response = model.invoke(state["messages"])
return {"messages": [response]}
What goes wrong: After 50 turns, state["messages"] might hold thousands of tokens. You blow past the context limit and your API costs spike.
Fix:
def chatbot(state: MessagesState):
trimmed = trim_messages(
state["messages"],
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=2000,
)
response = model.invoke(trimmed)
return {"messages": [response]}
Mistake 3: Sharing thread_id across users
Wrong:
config = {"configurable": {"thread_id": "main"}}
# Alice and Bob both use "main" — they see each other's messages!
What goes wrong: Thread IDs control short-term memory scope. Same ID = shared history. That’s a privacy leak waiting to happen.
Fix:
config = {"configurable": {"thread_id": f"user-{user_id}"}}
Mistake 4: Mixing up Store namespaces and thread IDs
These are two independent systems. Thread IDs gate short-term memory (the message list). Store namespaces gate long-term memory (saved facts).
# Short-term: scoped by thread_id
config = {"configurable": {"thread_id": "session-42"}}
# Long-term: scoped by namespace
store.put(namespace=("users", "alex"), key="pref", value={...})
Thread IDs rotate each session. Namespaces stay fixed for the same user forever. Keep them straight.
Predict the output: Your graph uses summary memory. The recap says “User is building a FastAPI app.” The last 2 turns cover database choices. The user asks “What framework am I using?” Can the agent answer?
Yes — the recap lands as a system message, so the LLM sees it. Even though the framework chat was many turns ago, the summary kept it.
Complete Code
Click to expand the full script (copy-paste and run)
# Complete code from: Memory Systems in LangGraph
# Requires: pip install langgraph langchain-openai langchain-core
# Python 3.10+
import os
import re
from typing import Annotated, TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.messages import (
HumanMessage,
AIMessage,
SystemMessage,
RemoveMessage,
)
from langchain_core.messages.utils import (
trim_messages,
count_tokens_approximately,
)
from langchain_core.runnables import RunnableConfig
from langgraph.graph import (
StateGraph,
MessagesState,
START,
END,
add_messages,
)
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore
from langgraph.store.base import BaseStore
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# --- 1. Basic short-term memory ---
def chatbot_basic(state: MessagesState):
response = model.invoke(state["messages"])
return {"messages": [response]}
graph_basic = StateGraph(MessagesState)
graph_basic.add_node("chatbot", chatbot_basic)
graph_basic.add_edge(START, "chatbot")
graph_basic.add_edge("chatbot", END)
app_basic = graph_basic.compile(checkpointer=InMemorySaver())
config_basic = {"configurable": {"thread_id": "demo-1"}}
r1 = app_basic.invoke(
{"messages": [HumanMessage(content="My name is Alex.")]},
config=config_basic,
)
print("Basic memory test:")
print(r1["messages"][-1].content)
r2 = app_basic.invoke(
{"messages": [HumanMessage(content="What's my name?")]},
config=config_basic,
)
print(r2["messages"][-1].content)
# --- 2. Sliding window memory ---
def chatbot_windowed(state: MessagesState):
trimmed = trim_messages(
state["messages"],
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=500,
start_on="human",
)
response = model.invoke(trimmed)
return {"messages": [response]}
graph_w = StateGraph(MessagesState)
graph_w.add_node("chatbot", chatbot_windowed)
graph_w.add_edge(START, "chatbot")
graph_w.add_edge("chatbot", END)
app_windowed = graph_w.compile(checkpointer=InMemorySaver())
# --- 3. Summary memory ---
class SummaryState(TypedDict):
messages: Annotated[list, add_messages]
summary: str
def chatbot_with_summary(state: SummaryState):
messages = state["messages"]
summary = state.get("summary", "")
if summary:
system_msg = SystemMessage(
content=f"Conversation summary: {summary}"
)
recent = trim_messages(
messages,
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=300,
start_on="human",
)
full_context = [system_msg] + recent
else:
full_context = messages
response = model.invoke(full_context)
return {"messages": [response]}
def should_summarize(state: SummaryState):
if len(state["messages"]) > 10:
return "summarize"
return END
def summarize_conversation(state: SummaryState):
messages = state["messages"]
existing = state.get("summary", "")
if existing:
prompt = (
f"Current summary:\n{existing}\n\n"
"Extend this summary with the new messages. "
"Capture key facts, decisions, preferences."
)
else:
prompt = (
"Summarize the conversation so far. "
"Capture key facts, decisions, preferences."
)
summary_msgs = messages + [HumanMessage(content=prompt)]
new_summary = model.invoke(summary_msgs)
delete = [
RemoveMessage(id=m.id) for m in messages[:-2]
]
return {
"summary": new_summary.content,
"messages": delete,
}
graph_s = StateGraph(SummaryState)
graph_s.add_node("chatbot", chatbot_with_summary)
graph_s.add_node("summarize", summarize_conversation)
graph_s.add_edge(START, "chatbot")
graph_s.add_conditional_edges(
"chatbot",
should_summarize,
{"summarize": "summarize", END: END},
)
graph_s.add_edge("summarize", END)
app_summary = graph_s.compile(checkpointer=InMemorySaver())
# --- 4. Long-term memory with Store ---
store = InMemoryStore()
def chatbot_with_memory(
state: MessagesState,
config: RunnableConfig,
*,
store: BaseStore,
):
user_id = config["configurable"]["user_id"]
memories = store.search(namespace=("users", user_id))
memory_text = "\n".join(
f"- {m.key}: {m.value}" for m in memories
)
system_msg = SystemMessage(
content=(
"You are a helpful assistant. "
"User info:\n" + memory_text + "\n"
"If the user shares new preferences, "
"note them as [MEMORY: key=value]."
)
)
response = model.invoke([system_msg] + state["messages"])
return {"messages": [response]}
def save_memories(
state: MessagesState,
config: RunnableConfig,
*,
store: BaseStore,
):
user_id = config["configurable"]["user_id"]
last_message = state["messages"][-1].content
pattern = r"\[MEMORY:\s*(\w+)=(.+?)\]"
matches = re.findall(pattern, last_message)
for key, value in matches:
store.put(
namespace=("users", user_id),
key=key,
value={"fact": value.strip()},
)
return {"messages": []}
graph_lt = StateGraph(MessagesState)
graph_lt.add_node("chatbot", chatbot_with_memory)
graph_lt.add_node("save_memories", save_memories)
graph_lt.add_edge(START, "chatbot")
graph_lt.add_edge("chatbot", "save_memories")
graph_lt.add_edge("save_memories", END)
app_longterm = graph_lt.compile(
checkpointer=InMemorySaver(),
store=store,
)
print("\nLong-term memory test:")
config_lt1 = {
"configurable": {
"thread_id": "session-1",
"user_id": "alex",
}
}
r3 = app_longterm.invoke(
{"messages": [HumanMessage(
content="I prefer dark mode and Python 3.12."
)]},
config=config_lt1,
)
print(r3["messages"][-1].content)
config_lt2 = {
"configurable": {
"thread_id": "session-2",
"user_id": "alex",
}
}
r4 = app_longterm.invoke(
{"messages": [HumanMessage(
content="What do you remember about me?"
)]},
config=config_lt2,
)
print(r4["messages"][-1].content)
print("\nScript completed successfully.")
FAQ
Does InMemorySaver last after a Python restart?
No. It keeps data in RAM only. Kill the process and it’s all gone. For lasting storage, switch to SqliteSaver, PostgresSaver, or any disk-backed option. The interface is the same — only the class name changes.
Can I use long-term memory without a checkpointer?
Yes. They’re fully independent. Compile with store=my_store and skip the checkpointer. The agent won’t track the current chat across calls, but it can still read and write long-term facts from the Store.
How do I delete facts from the Store?
Use store.delete(namespace=("users", "alex"), key="preference") to drop a single item. To wipe all facts for a user, search the namespace first and loop through the results:
items = store.search(namespace=("users", "alex"))
for item in items:
store.delete(namespace=("users", "alex"), key=item.key)
What’s the gap between trim_messages and RemoveMessage?
trim_messages builds a trimmed copy and leaves the real state alone. Use it right before an LLM call. RemoveMessage edits the graph’s state for real — it strips messages from the checkpoint. Use it when you want to cut down what’s stored, like right after a summary step.
Can I do vector search with the Store?
Yes. InMemoryStore has built-in vector search — just pass it an embedding function. Then call store.search(namespace=(...), query="user's coding prefs") and results come back ranked by meaning. Very useful when you have many stored facts and need to grab the most relevant ones.
Summary
Memory is what makes a raw LLM feel like a real agent. LangGraph hands you three ways to build it:
-
Short-term memory — a checkpointer keeps message history inside one thread. One line to set up.
-
Sliding window / summary memory — controls token costs by trimming or compressing old turns. A must for long-running chats.
-
Long-term memory — the Store holds facts across threads and sessions, organized by namespaces.
Most production agents blend short-term and long-term. Toss in summary memory when chats regularly go past 20 rounds.
Practice exercise: Build a chatbot that uses all three types. Store user prefs in a Store, summarize after 8 messages, and trim to 500 tokens before each LLM call. Test across two sessions — the second should recall prefs from the first.
Solution
Blend the production_chatbot function from the “Combining Memory Types” section with save_memories and summarize_conversation. Your graph needs four nodes: chatbot, save_memories, a branch for summarization, and the summarize node. Use SummaryState as the state class. Compile with both checkpointer=InMemorySaver() and store=InMemoryStore().
The key insight: the chatbot node reads from all three sources (Store, summary, trimmed messages), the save_memories node writes to the Store, and the summarize node compresses the message history.
References
-
LangGraph documentation — Memory overview. Link
-
LangGraph documentation — How to add memory. Link
-
LangGraph documentation — Long-term memory with Store. Link
-
LangChain blog — Launching Long-Term Memory Support in LangGraph. Link
-
LangChain documentation —
trim_messagesutility. Link -
LangGraph API Reference — InMemoryStore. Link
-
LangGraph documentation —
RemoveMessage. Link -
Harrison Chase — Long-Term Agentic Memory with LangGraph (DeepLearning.AI). Link
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →