machine learning +
Build a Python AI Chatbot with Memory Using LangChain
LangGraph Memory: Short-Term, Long-Term & Conversation
Build LangGraph agents with lasting memory using sliding windows, summaries, and cross-session stores to manage tokens and recall user context.
Build agents that recall what happened five messages ago, five days ago, and five chats ago — using the right memory type for each job.
Your chatbot nails a tough question. The user follows up with “can you break that down more simply?” — and the bot has no clue what “that” means. Without memory, every call starts fresh. The chat feels broken, and the user bounces.
Memory is the bridge between a raw LLM call and a real agent. LangGraph hands you three kinds. Short-term memory keeps the current chat in one piece. Summary memory squashes old messages so you don’t blow your token budget. Long-term memory stores facts that last across sessions, so the agent knows returning users by name.
Before we build anything, let me lay out how these three fit together.
Every message in a chat lands in state — that’s your short-term memory. But context windows have a ceiling. When the chat grows long, you have two options: chop off old messages with a sliding window, or boil them down into a summary. Both keep the LLM within its token limit.
Long-term memory works in a whole different way. It lives outside the chat thread, inside a dedicated Store. Your agent writes facts there — like “this user prefers Python over JavaScript” — and reads them back in later sessions. The Store spans all threads, so a user who shows up next week gets a custom greeting.
We’ll build each type step by step, then stack them into one agent.
What Counts as Memory in LangGraph?
At its core, memory in LangGraph is just data that sticks around between calls. Nothing abstract — if your agent can read info from a past turn, that’s memory.
LangGraph splits it into two buckets by how far the memory reaches:
- Short-term memory — locked to one thread (one chat). The full message log lives here, and a checkpointer keeps it intact between calls.
- Long-term memory — stretches across threads. A Store holds it, so it carries over from one chat to the next. User prefs, saved facts, learned habits — all fair game.
The import block below sets up everything we need for the rest of this post: the LLM wrapper, message classes, the trim_messages tool for window trimming, and LangGraph’s graph-building and memory pieces.
python
import os
import re
from typing import Annotated, TypedDict
from langchain_openai import ChatOpenAI
from langchain_core.messages import (
HumanMessage,
AIMessage,
SystemMessage,
RemoveMessage,
)
from langchain_core.messages.utils import (
trim_messages,
count_tokens_approximately,
)
from langchain_core.runnables import RunnableConfig
from langgraph.graph import (
StateGraph,
MessagesState,
START,
END,
add_messages,
)
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore
from langgraph.store.base import BaseStore
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
Prerequisites
- Python version: 3.10+
- Required libraries: langgraph (0.4+), langchain-openai (0.3+), langchain-core (0.3+)
- Install:
pip install langgraph langchain-openai langchain-core - API key: An OpenAI API key set as
OPENAI_API_KEY. See OpenAI’s docs to create one. - Time to complete: ~30 minutes
- Prior knowledge: Basic LangGraph state management and checkpointing concepts.
KEY INSIGHT: Short-term memory answers “what did we just say?” Long-term memory answers “what do I know about this person from past chats?” Two different questions, two different LangGraph APIs.
How Does Short-Term Memory Work?
Think about why this matters. Without short-term memory, your bot can’t even handle the word “yes” as an answer — because it has no idea what question it asked. Each turn is a fresh start with zero context.
The good news: if you use MessagesState and wire in a checkpointer, you get short-term memory for free. The checkpointer stashes the full message log after each graph run. Next time the same thread fires, those messages load right back in.
Here’s a bare-bones chatbot with this wired up. InMemorySaver keeps everything in Python’s RAM. The thread_id in the config tells LangGraph which chat log to grab.
python
def chatbot(state: MessagesState):
response = model.invoke(state["messages"])
return {"messages": [response]}
graph = StateGraph(MessagesState)
graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", END)
checkpointer = InMemorySaver()
app = graph.compile(checkpointer=checkpointer)
When two calls share the same thread_id, they share the message log. Let me prove it — I’ll introduce myself, then ask if the bot remembers.
python
config = {"configurable": {"thread_id": "user-123"}}
response1 = app.invoke(
{"messages": [HumanMessage(content="My name is Alex.")]},
config=config,
)
print(response1["messages"][-1].content)
The model says hello to Alex. Now for the test that matters:
python
response2 = app.invoke(
{"messages": [HumanMessage(content="What's my name?")]},
config=config,
)
print(response2["messages"][-1].content)
It gets it right — “Alex.” The checkpointer loaded the old messages before this call ran, so the model had full context. Drop the checkpointer and the same question gets a confused “I don’t know.”
That’s the simplest version of short-term memory. But a problem lurks underneath.
Why Can’t You Just Keep Every Message?
Here’s the catch. Every saved message rides along to the LLM on the next call. After 50 back-and-forth rounds, you could be sitting on 100+ messages — thousands of tokens piling up.
Even models with 128K-token windows feel the squeeze. You’re billed by the token. And frankly, most of those old lines are dead weight. The user asked about Python generators 30 turns ago — why would the model need that when the topic has shifted to memory systems?
So you need a strategy. LangGraph offers two: a sliding window that chops off old messages, and summary memory that boils them into a short recap.
How Does the Sliding Window Approach Work?
A sliding window is the fastest fix for runaway token counts. The idea is dead simple: before you call the LLM, clip the old messages and only pass the newest ones. The older lines still sit safe in the checkpoint — you just don’t feed them to the model.
LangChain ships a handy trim_messages function for this. Give it the full log, set a token cap, and it gives back a trimmed list. strategy="last" keeps the most recent turns. start_on="human" makes sure the result opens with a human message — models get confused when a conversation kicks off with a stray AI reply.
python
def chatbot_windowed(state: MessagesState):
trimmed = trim_messages(
state["messages"],
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=500,
start_on="human",
)
response = model.invoke(trimmed)
return {"messages": [response]}
The graph setup looks just like before — only the node function changed.
python
graph_w = StateGraph(MessagesState)
graph_w.add_node("chatbot", chatbot_windowed)
graph_w.add_edge(START, "chatbot")
graph_w.add_edge("chatbot", END)
app_windowed = graph_w.compile(checkpointer=InMemorySaver())
What does this look like in action? The bot remembers the latest turns but loses anything that slid past the token cap.
python
config_w = {"configurable": {"thread_id": "window-demo"}}
app_windowed.invoke(
{"messages": [HumanMessage(content="My name is Alex.")]},
config=config_w,
)
app_windowed.invoke(
{"messages": [HumanMessage(content="I work at a startup.")]},
config=config_w,
)
app_windowed.invoke(
{"messages": [HumanMessage(content="We use Python and FastAPI.")]},
config=config_w,
)
After a few rounds, the oldest turns drop out of what the LLM receives. The checkpoint still has the full log — but the model only gets what fits in the 500-token window.
TIP: Set
max_tokensbased on your product, not the model’s raw limit. A support bot might do fine with 2,000 tokens of history. A coding helper might need 8,000 to keep track of the file it’s editing. Start low and raise the bar if the bot forgets too much.
Quick Check: What if you set max_tokens=100 with strategy="last", but a single message already takes 150 tokens? The function still keeps the newest message — it won’t drop the latest turn even if it blows past the budget.
typescript
{
type: 'exercise',
id: 'sliding-window-ex',
title: 'Exercise 1: Build a Windowed Chatbot',
difficulty: 'intermediate',
exerciseType: 'write',
instructions: 'Create a chatbot node that trims messages to the last 300 tokens using `trim_messages`. Use `count_tokens_approximately` as the token counter and set `start_on="human"`. The function should take a `MessagesState` and return a dict with the model response.',
starterCode: 'def windowed_chatbot(state: MessagesState):\n # Trim messages to last 300 tokens\n trimmed = trim_messages(\n state["messages"],\n # Add the missing parameters here\n )\n response = model.invoke(trimmed)\n return {"messages": [response]}',
testCases: [
{ id: 'tc1', input: 'print(windowed_chatbot.__name__)', expectedOutput: 'windowed_chatbot', description: 'Function should be named windowed_chatbot' },
{ id: 'tc2', input: 'import inspect; sig = inspect.signature(windowed_chatbot); print("state" in str(sig))', expectedOutput: 'True', description: 'Function takes state parameter' },
],
hints: [
'You need three keyword arguments: strategy="last", token_counter=count_tokens_approximately, max_tokens=300',
'Add start_on="human" to ensure trimmed messages start with a human message: trim_messages(state["messages"], strategy="last", token_counter=count_tokens_approximately, max_tokens=300, start_on="human")',
],
solution: 'def windowed_chatbot(state: MessagesState):\n trimmed = trim_messages(\n state["messages"],\n strategy="last",\n token_counter=count_tokens_approximately,\n max_tokens=300,\n start_on="human",\n )\n response = model.invoke(trimmed)\n return {"messages": [response]}',
solutionExplanation: 'The trim_messages function takes the full message list and returns only the most recent messages that fit within 300 tokens. Setting start_on="human" ensures the trimmed result starts with a human message, which prevents confusing the LLM with an orphaned AI response.',
xpReward: 15,
}
How Does Summary Memory Work?
A sliding window is simple, but it’s a blunt tool. Once messages fall off the edge, that knowledge is lost forever. Summary memory offers a smarter path: before dropping old messages, you ask the LLM to pack them into a tight recap. The recap replaces the old turns — important facts survive, but the token bill shrinks.
To make this work, you define a custom state that carries both the messages and a running summary.
python
class SummaryState(TypedDict):
messages: Annotated[list, add_messages]
summary: str
Inside the chatbot node, the logic goes like this: grab the current summary, feed it in as a system message, and trim the history down to just the newest turns. So the model receives two layers of context: a dense summary of everything that came before, plus the last few messages in full detail.
python
def chatbot_with_summary(state: SummaryState):
messages = state["messages"]
summary = state.get("summary", "")
if summary:
system_msg = SystemMessage(
content=f"Conversation summary: {summary}"
)
recent = trim_messages(
messages,
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=300,
start_on="human",
)
full_context = [system_msg] + recent
else:
full_context = messages
response = model.invoke(full_context)
return {"messages": [response]}
So who creates the summary? A dedicated node handles that job. It fires only when messages pile up past a threshold you set. The node asks the LLM to weave new info into the old summary, then clears old messages from state with RemoveMessage. We keep the last 2 messages around so the chat still flows naturally.
python
def should_summarize(state: SummaryState):
"""Route to summarizer when messages pile up."""
if len(state["messages"]) > 10:
return "summarize"
return END
def summarize_conversation(state: SummaryState):
messages = state["messages"]
existing = state.get("summary", "")
if existing:
prompt = (
f"Current summary:\n{existing}\n\n"
"Extend this summary with the new messages. "
"Capture key facts, decisions, preferences."
)
else:
prompt = (
"Summarize the conversation so far. "
"Capture key facts, decisions, preferences."
)
summary_msgs = messages + [HumanMessage(content=prompt)]
new_summary = model.invoke(summary_msgs)
# Keep last 2 messages, remove the rest
delete = [
RemoveMessage(id=m.id) for m in messages[:-2]
]
return {
"summary": new_summary.content,
"messages": delete,
}
WARNING:
RemoveMessagedoes not wipe messages from the checkpoint log. It pulls them out of the live state so the LLM never sees them. The checkpoint still tracks every message — you can always replay or dig through old turns via the history API.
Now for the wiring. The chatbot runs first, then a conditional edge decides whether to trigger the summary step.
python
graph_s = StateGraph(SummaryState)
graph_s.add_node("chatbot", chatbot_with_summary)
graph_s.add_node("summarize", summarize_conversation)
graph_s.add_edge(START, "chatbot")
graph_s.add_conditional_edges(
"chatbot",
should_summarize,
{"summarize": "summarize", END: END},
)
graph_s.add_edge("summarize", END)
app_summary = graph_s.compile(checkpointer=InMemorySaver())
When the chat crosses 10 messages, the summary node runs by itself. Old turns vanish from state, replaced by a compact recap. The model gets the recap plus the 2 newest lines — rich context with a thin token footprint.
KEY INSIGHT: Summary memory trades one LLM call now for cheaper calls later. You spend a bit of compute to pack old messages, but every future call uses fewer tokens. For any chat that runs past 20 rounds, that deal almost always comes out ahead.
typescript
{
type: 'exercise',
id: 'summary-memory-ex',
title: 'Exercise 2: Write the Summarization Trigger',
difficulty: 'intermediate',
exerciseType: 'write',
instructions: 'Write a `should_summarize` function that checks if the message count exceeds a given threshold. If `len(state["messages"])` is greater than 6, return `"summarize"`. Otherwise, return `END`. The function takes a `SummaryState` dict.',
starterCode: 'from langgraph.graph import END\n\ndef should_summarize(state: SummaryState):\n # Check message count and return route\n pass',
testCases: [
{ id: 'tc1', input: 'print(should_summarize({"messages": [1,2,3,4,5,6,7], "summary": ""}))', expectedOutput: 'summarize', description: '7 messages should trigger summarize' },
{ id: 'tc2', input: 'print(should_summarize({"messages": [1,2,3], "summary": ""}))', expectedOutput: '__end__', description: '3 messages should not trigger summarize' },
],
hints: [
'Compare len(state["messages"]) to 6 using an if statement',
'Full solution: if len(state["messages"]) > 6: return "summarize" else: return END',
],
solution: 'def should_summarize(state: SummaryState):\n if len(state["messages"]) > 6:\n return "summarize"\n return END',
solutionExplanation: 'The function checks the message count against the threshold. When messages exceed the limit, it routes to the summarize node. Otherwise, the graph ends normally. You can adjust the threshold based on your token budget.',
xpReward: 15,
}
How Does Long-Term Memory Work with the LangGraph Store?
Short-term memory is locked to a single thread. Start a fresh thread and the agent knows nothing. That works for throwaway chats, but what happens when a user drops in next Tuesday and says “use the same settings as last time”? If all you have is short-term memory, the agent stares back blankly.
That’s the gap long-term memory fills. It relies on the Store — a key-value database that exists outside the graph’s state entirely. You put facts in and get them out whenever you need them. The store sorts entries by namespaces (think folders) and keys (think file names). Because no single thread owns it, data carries over from session to session.
InMemoryStore is the version you use while building. For real traffic, you’d plug in PostgresStore or a MongoDB backend. The calling code doesn’t change.
Here’s the core pattern. You stash items as dicts under a namespace tuple and a string key.
python
store = InMemoryStore()
store.put(
namespace=("users", "alex"),
key="preference",
value={"language": "Python", "framework": "FastAPI"},
)
item = store.get(
namespace=("users", "alex"), key="preference"
)
print(item.value)
python
{'language': 'Python', 'framework': 'FastAPI'}
You can also search a namespace to grab every fact stored for a given user.
python
memories = store.search(namespace=("users", "alex"))
for memory in memories:
print(f"{memory.key}: {memory.value}")
python
preference: {'language': 'Python', 'framework': 'FastAPI'}
NOTE:
InMemoryStoredisappears the moment your Python process dies. For anything real, usePostgresStorefromlanggraph-checkpoint-postgresor a Redis/MongoDB backend. Your code stays the same — swap the class and go.
The real magic happens when you connect the Store to a running graph. LangGraph hands the store to your node functions by itself — as long as you compile with a store argument. Nodes pick it up through a keyword-only store parameter.
Below is a chatbot that checks the store for user prefs before writing a reply. The user_id comes from the config — in a production setup, your auth layer would set this.
python
def chatbot_with_memory(
state: MessagesState,
config: RunnableConfig,
*,
store: BaseStore,
):
user_id = config["configurable"]["user_id"]
memories = store.search(namespace=("users", user_id))
memory_text = "\n".join(
f"- {m.key}: {m.value}" for m in memories
)
system_msg = SystemMessage(
content=(
"You are a helpful assistant. "
"User info:\n" + memory_text + "\n"
"If the user shares new preferences, "
"note them as [MEMORY: key=value]."
)
)
response = model.invoke([system_msg] + state["messages"])
return {"messages": [response]}
But who actually writes the memories? A second node takes care of that. It scans the agent’s reply for [MEMORY: key=value] markers and pushes each one into the store.
python
def save_memories(
state: MessagesState,
config: RunnableConfig,
*,
store: BaseStore,
):
user_id = config["configurable"]["user_id"]
last_message = state["messages"][-1].content
pattern = r"\[MEMORY:\s*(\w+)=(.+?)\]"
matches = re.findall(pattern, last_message)
for key, value in matches:
store.put(
namespace=("users", user_id),
key=key,
value={"fact": value.strip()},
)
return {"messages": []}
Now compile the graph and pass in both a checkpointer (for the live chat) and the store (for lasting facts).
python
graph_lt = StateGraph(MessagesState)
graph_lt.add_node("chatbot", chatbot_with_memory)
graph_lt.add_node("save_memories", save_memories)
graph_lt.add_edge(START, "chatbot")
graph_lt.add_edge("chatbot", "save_memories")
graph_lt.add_edge("save_memories", END)
app_longterm = graph_lt.compile(
checkpointer=InMemorySaver(),
store=store,
)
Time to prove it works across sessions. In session 1, the user shares a couple of prefs. In session 2 (brand-new thread, same user), the agent should bring those prefs back.
python
config1 = {
"configurable": {
"thread_id": "session-1",
"user_id": "alex",
}
}
r1 = app_longterm.invoke(
{"messages": [HumanMessage(
content="I prefer dark mode and Python 3.12."
)]},
config=config1,
)
print(r1["messages"][-1].content)
The agent replies, weaving [MEMORY: ...] tags into its response. The save_memories node catches those tags and writes each fact to the Store.
python
config2 = {
"configurable": {
"thread_id": "session-2",
"user_id": "alex",
}
}
r2 = app_longterm.invoke(
{"messages": [HumanMessage(
content="What do you remember about me?"
)]},
config=config2,
)
print(r2["messages"][-1].content)
The agent recalls everything from session 1, even though this is a completely separate thread. That’s the Store doing its job.
TIP: Stick with tidy, fixed keys — not free-form text. Save facts under names like
language_preference,framework,timezone. Clean keys make lookups easy and your code far simpler to debug.
typescript
{
type: 'exercise',
id: 'long-term-memory-ex',
title: 'Exercise 3: Store and Retrieve a User Preference',
difficulty: 'intermediate',
exerciseType: 'write',
instructions: 'Using an `InMemoryStore`, store a memory with namespace `("users", "bob")`, key `"editor"`, and value `{"tool": "VS Code"}`. Then retrieve it using `store.get()` and print the value dict.',
starterCode: 'from langgraph.store.memory import InMemoryStore\n\nstore = InMemoryStore()\n\n# Store the memory\n# YOUR CODE HERE\n\n# Retrieve and print it\n# YOUR CODE HERE',
testCases: [
{ id: 'tc1', input: 'item = store.get(namespace=("users", "bob"), key="editor"); print(item.value)', expectedOutput: "{'tool': 'VS Code'}", description: 'Should retrieve the stored preference' },
{ id: 'tc2', input: 'items = store.search(namespace=("users", "bob")); print(len(list(items)))', expectedOutput: '1', description: 'Should have exactly one memory' },
],
hints: [
'Use store.put(namespace=(...), key="...", value={...}) to store the memory',
'Full store call: store.put(namespace=("users", "bob"), key="editor", value={"tool": "VS Code"}). Then: item = store.get(namespace=("users", "bob"), key="editor") and print(item.value)',
],
solution: 'from langgraph.store.memory import InMemoryStore\n\nstore = InMemoryStore()\nstore.put(namespace=("users", "bob"), key="editor", value={"tool": "VS Code"})\n\nitem = store.get(namespace=("users", "bob"), key="editor")\nprint(item.value)',
solutionExplanation: 'The put method stores a dict under a namespace and key. The get method retrieves it by the same namespace and key. Namespaces are tuples of strings that act like folder paths, scoping memories to specific users or contexts.',
xpReward: 15,
}
How Do the Memory Types Stack Up?
| Feature | Short-Term (Checkpointer) | Sliding Window | Summary Memory | Long-Term (Store) |
|---|---|---|---|---|
| Scope | Single thread | Single thread | Single thread | Cross-thread |
| Persistence | Until thread ends | Until thread ends | Until thread ends | Indefinite |
| Token cost | Grows with chat | Fixed ceiling | Fixed ceiling | Minimal (on demand) |
| Info loss | None | Old turns dropped | Packed down, some detail gone | None |
| Best for | Short chats | Support Q&A | Long brainstorms | User profiles, prefs |
| Setup work | One line (checkpointer) | Add trim_messages | Custom state + summary node | Store + read/write nodes |
KEY INSIGHT: Most real agents blend at least two types. Short-term for the live thread, long-term for user context across sessions. Summary memory steps in only when chats often run past 20 rounds.
How Do You Layer All Three in a Single Agent?
In the real world, agents don’t rely on just one memory type. They stack them. Here’s the combo I’d pick for most production bots: the Store for lasting user facts, summary memory to pack down old turns, and a sliding window as the last line of defense before the LLM call.
The state carries both the message log and a summary string. Inside the node, you pull long-term facts from the Store, feed in the running summary as context, and trim the message list to the most recent turns.
python
def production_chatbot(
state: SummaryState,
config: RunnableConfig,
*,
store: BaseStore,
):
user_id = config["configurable"]["user_id"]
summary = state.get("summary", "")
# 1. Load long-term memories
memories = store.search(namespace=("users", user_id))
mem_text = "\n".join(
f"- {m.key}: {m.value}" for m in memories
)
# 2. Build system context
parts = ["You are a helpful assistant."]
if mem_text:
parts.append(f"User profile:\n{mem_text}")
if summary:
parts.append(f"Conversation so far:\n{summary}")
system_msg = SystemMessage(content="\n\n".join(parts))
# 3. Trim recent messages
recent = trim_messages(
state["messages"],
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=1000,
start_on="human",
)
response = model.invoke([system_msg] + recent)
return {"messages": [response]}
This single function draws from all three wells: the Store (lasting facts), the summary (packed history), and the trimmed message log (fresh context). The LLM sees the whole picture with none of the token bloat.
UNDER-THE-HOOD: The order of what you feed the model makes a difference. System message goes first, then the recent turns. The LLM reads the system message as stable background and the recent turns as the active chat. Placing the summary in the system message (not as a user message) keeps the model from trying to “answer” the summary.
When Should You Use Which Memory Type?
The choice isn’t about what sounds fancy — it’s about what your users actually run into.
Short-term memory alone is enough when chats wrap up in a few turns. A Q&A bot that handles 3-5 exchanges? Plug in a checkpointer and call it done.
Add a sliding window when chats could stretch long but the early bits lose relevance fast. Customer support is the classic case. The fix almost always hangs on the last handful of messages, not the greeting from twenty minutes ago.
Bring in summary memory when old context still shapes the outcome, but you can’t afford to keep every word. Picture a planning session where a design choice from an hour ago still matters — but the exact phrasing doesn’t.
Reach for long-term memory when users return across sessions. Personal assistants, tutoring platforms, and enterprise copilots all need this. The moment a user says “use the same format as last time” and your agent has no answer, you know it’s time for a Store.
WARNING: Don’t pile on memory you don’t need. Each layer brings more moving parts — extra state, extra nodes, extra places to break. Start with the simplest option that does the job. Layer up only when your users hit the limits.
What Mistakes Should You Watch Out For?
Mistake 1: Skipping the checkpointer
Wrong:
python
app = graph.compile() # No checkpointer!
The problem: Nothing survives between calls. Each invoke() starts with a blank message list. Your bot has total amnesia.
Fix:
python
app = graph.compile(checkpointer=InMemorySaver())
Mistake 2: Dumping the full message log into long chats
Wrong:
python
def chatbot(state: MessagesState):
response = model.invoke(state["messages"])
return {"messages": [response]}
The problem: After 50 rounds, state["messages"] could pack thousands of tokens. You’ll blow the context window, and your API bill goes through the roof.
Fix:
python
def chatbot(state: MessagesState):
trimmed = trim_messages(
state["messages"],
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=2000,
)
response = model.invoke(trimmed)
return {"messages": [response]}
Mistake 3: Sharing one thread_id across users
Wrong:
python
config = {"configurable": {"thread_id": "main"}}
# Alice and Bob both use "main" — they see each other's messages!
The problem: Thread IDs wall off short-term memory. Same ID = shared log. That’s a privacy hole.
Fix:
python
config = {"configurable": {"thread_id": f"user-{user_id}"}}
Mistake 4: Confusing Store namespaces with thread IDs
The Store and the checkpointer are totally separate. Thread IDs fence off short-term memory (messages). Namespaces fence off long-term memory (facts).
python
# Short-term: scoped by thread_id
config = {"configurable": {"thread_id": "session-42"}}
# Long-term: scoped by namespace
store.put(namespace=("users", "alex"), key="pref", value={...})
A thread ID rotates with each session. A namespace stays put for the same user across every session. Confuse the two and your data lands in the wrong place.
Predict the output: You have a graph with summary memory. The summary says “User is building a FastAPI app.” The last 2 messages talk about database choices. The user asks “What framework am I using?” Will the agent know?
Yes — because the summary gets sent as a system message. Even though the framework talk happened many turns ago, the summary held onto that fact.
Complete Code
Frequently Asked Questions
Does InMemorySaver survive a Python restart?
No. It lives in RAM. Kill the process and the data goes with it. If your data needs to last, reach for SqliteSaver, PostgresSaver, or any disk-backed saver. The API is identical — only the class name swaps out.
Can I use the Store without a checkpointer?
Absolutely. The Store and the checkpointer run as two independent pieces. Compile a graph with store=my_store and no checkpointer if you want. The agent won’t remember the live chat between calls, but it will still read and write lasting facts through the Store.
How do I remove facts from the Store?
Call store.delete(namespace=("users", "alex"), key="preference") to drop a single entry. To clear out everything for a user, scan the namespace first, then delete each result:
python
items = store.search(namespace=("users", "alex"))
for item in items:
store.delete(namespace=("users", "alex"), key=item.key)
How is trim_messages different from RemoveMessage?
They do different jobs. trim_messages creates a shorter copy of the log — the actual state stays untouched. Reach for it when you need a one-off trim right before an LLM call. RemoveMessage, on the other hand, edits the graph’s state for real. Messages leave the checkpoint for good. Use it when you want a smaller stored state — typically right after running a summary.
Can I do semantic search on the Store?
Yes. InMemoryStore can handle vector lookups when you wire in an embedding function. You’d write store.search(namespace=(...), query="user's coding style") and get results ranked by meaning. Great when you’ve saved dozens of facts and only need the most relevant ones.
Summary
Memory is what turns a stateless LLM call into a real agent. LangGraph gives you three ways to get there:
- Short-term memory — a checkpointer saves the message log within one thread. Takes one line to set up.
- Sliding window / summary memory — trims or packs old messages to keep your token budget under control. Essential once chats run long.
- Long-term memory — the Store keeps facts alive across threads and sessions, sorted by namespaces.
In practice, most agents blend short-term and long-term memory. Layer in summary memory when your chats regularly cross the 20-turn mark.
Practice exercise: Build a chatbot that layers all three types. It should store user prefs in a Store, summarize after 8 messages, and trim to 500 tokens before each LLM call. Test it across two sessions — the second should recall prefs from the first.
References
- LangGraph documentation — Memory overview. Link
- LangGraph documentation — How to add memory. Link
- LangGraph documentation — Long-term memory with Store. Link
- LangChain blog — Launching Long-Term Memory Support in LangGraph. Link
- LangChain documentation —
trim_messagesutility. Link - LangGraph API Reference — InMemoryStore. Link
- LangGraph documentation —
RemoveMessage. Link - Harrison Chase — Long-Term Agentic Memory with LangGraph (DeepLearning.AI). Link
Free Course
Master Core Python — Your First Step into AI/ML
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Up Next in Learning Path
LangGraph Subgraphs: Compose Reusable Workflows
