LangGraph Memory: Short-Term, Long-Term & Conversation

Build LangGraph agents with lasting memory using sliding windows, summaries, and cross-session stores to manage tokens and recall user context.

Written by Selva Prabhakaran | 27 min read

Build agents that recall what happened five messages ago, five days ago, and five chats ago — using the right memory type for each job.

Your chatbot nails a tough question. The user follows up with “can you break that down more simply?” — and the bot has no clue what “that” means. Without memory, every call starts fresh. The chat feels broken, and the user bounces.

Memory is the bridge between a raw LLM call and a real agent. LangGraph hands you three kinds. Short-term memory keeps the current chat in one piece. Summary memory squashes old messages so you don’t blow your token budget. Long-term memory stores facts that last across sessions, so the agent knows returning users by name.

Before we build anything, let me lay out how these three fit together.

Every message in a chat lands in state — that’s your short-term memory. But context windows have a ceiling. When the chat grows long, you have two options: chop off old messages with a sliding window, or boil them down into a summary. Both keep the LLM within its token limit.

Long-term memory works in a whole different way. It lives outside the chat thread, inside a dedicated Store. Your agent writes facts there — like “this user prefers Python over JavaScript” — and reads them back in later sessions. The Store spans all threads, so a user who shows up next week gets a custom greeting.

We’ll build each type step by step, then stack them into one agent.

What Counts as Memory in LangGraph?

At its core, memory in LangGraph is just data that sticks around between calls. Nothing abstract — if your agent can read info from a past turn, that’s memory.

LangGraph splits it into two buckets by how far the memory reaches:

Short-term memory — locked to one thread (one chat). The full message log lives here, and a checkpointer keeps it intact between calls.
Long-term memory — stretches across threads. A Store holds it, so it carries over from one chat to the next. User prefs, saved facts, learned habits — all fair game.

The import block below sets up everything we need for the rest of this post: the LLM wrapper, message classes, the trim_messages tool for window trimming, and LangGraph’s graph-building and memory pieces.

python

import os
import re
from typing import Annotated, TypedDict

from langchain_openai import ChatOpenAI
from langchain_core.messages import (
    HumanMessage,
    AIMessage,
    SystemMessage,
    RemoveMessage,
)
from langchain_core.messages.utils import (
    trim_messages,
    count_tokens_approximately,
)
from langchain_core.runnables import RunnableConfig
from langgraph.graph import (
    StateGraph,
    MessagesState,
    START,
    END,
    add_messages,
)
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore
from langgraph.store.base import BaseStore

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

Prerequisites

Python version: 3.10+
Required libraries: langgraph (0.4+), langchain-openai (0.3+), langchain-core (0.3+)
Install: pip install langgraph langchain-openai langchain-core
API key: An OpenAI API key set as OPENAI_API_KEY. See OpenAI’s docs to create one.
Time to complete: ~30 minutes
Prior knowledge: Basic LangGraph state management and checkpointing concepts.

KEY INSIGHT: Short-term memory answers “what did we just say?” Long-term memory answers “what do I know about this person from past chats?” Two different questions, two different LangGraph APIs.

How Does Short-Term Memory Work?

Think about why this matters. Without short-term memory, your bot can’t even handle the word “yes” as an answer — because it has no idea what question it asked. Each turn is a fresh start with zero context.

The good news: if you use MessagesState and wire in a checkpointer, you get short-term memory for free. The checkpointer stashes the full message log after each graph run. Next time the same thread fires, those messages load right back in.

Here’s a bare-bones chatbot with this wired up. InMemorySaver keeps everything in Python’s RAM. The thread_id in the config tells LangGraph which chat log to grab.

python

def chatbot(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": [response]}

graph = StateGraph(MessagesState)
graph.add_node("chatbot", chatbot)
graph.add_edge(START, "chatbot")
graph.add_edge("chatbot", END)

checkpointer = InMemorySaver()
app = graph.compile(checkpointer=checkpointer)

When two calls share the same thread_id, they share the message log. Let me prove it — I’ll introduce myself, then ask if the bot remembers.

python

config = {"configurable": {"thread_id": "user-123"}}

response1 = app.invoke(
    {"messages": [HumanMessage(content="My name is Alex.")]},
    config=config,
)
print(response1["messages"][-1].content)

The model says hello to Alex. Now for the test that matters:

python

response2 = app.invoke(
    {"messages": [HumanMessage(content="What's my name?")]},
    config=config,
)
print(response2["messages"][-1].content)

It gets it right — “Alex.” The checkpointer loaded the old messages before this call ran, so the model had full context. Drop the checkpointer and the same question gets a confused “I don’t know.”

That’s the simplest version of short-term memory. But a problem lurks underneath.

Why Can’t You Just Keep Every Message?

Here’s the catch. Every saved message rides along to the LLM on the next call. After 50 back-and-forth rounds, you could be sitting on 100+ messages — thousands of tokens piling up.

Even models with 128K-token windows feel the squeeze. You’re billed by the token. And frankly, most of those old lines are dead weight. The user asked about Python generators 30 turns ago — why would the model need that when the topic has shifted to memory systems?

So you need a strategy. LangGraph offers two: a sliding window that chops off old messages, and summary memory that boils them into a short recap.

How Does the Sliding Window Approach Work?

A sliding window is the fastest fix for runaway token counts. The idea is dead simple: before you call the LLM, clip the old messages and only pass the newest ones. The older lines still sit safe in the checkpoint — you just don’t feed them to the model.

LangChain ships a handy trim_messages function for this. Give it the full log, set a token cap, and it gives back a trimmed list. strategy="last" keeps the most recent turns. start_on="human" makes sure the result opens with a human message — models get confused when a conversation kicks off with a stray AI reply.

python

def chatbot_windowed(state: MessagesState):
    trimmed = trim_messages(
        state["messages"],
        strategy="last",
        token_counter=count_tokens_approximately,
        max_tokens=500,
        start_on="human",
    )
    response = model.invoke(trimmed)
    return {"messages": [response]}

The graph setup looks just like before — only the node function changed.

python

graph_w = StateGraph(MessagesState)
graph_w.add_node("chatbot", chatbot_windowed)
graph_w.add_edge(START, "chatbot")
graph_w.add_edge("chatbot", END)

app_windowed = graph_w.compile(checkpointer=InMemorySaver())

What does this look like in action? The bot remembers the latest turns but loses anything that slid past the token cap.

python

config_w = {"configurable": {"thread_id": "window-demo"}}

app_windowed.invoke(
    {"messages": [HumanMessage(content="My name is Alex.")]},
    config=config_w,
)
app_windowed.invoke(
    {"messages": [HumanMessage(content="I work at a startup.")]},
    config=config_w,
)
app_windowed.invoke(
    {"messages": [HumanMessage(content="We use Python and FastAPI.")]},
    config=config_w,
)

After a few rounds, the oldest turns drop out of what the LLM receives. The checkpoint still has the full log — but the model only gets what fits in the 500-token window.

TIP: Set max_tokens based on your product, not the model’s raw limit. A support bot might do fine with 2,000 tokens of history. A coding helper might need 8,000 to keep track of the file it’s editing. Start low and raise the bar if the bot forgets too much.

Quick Check: What if you set max_tokens=100 with strategy="last", but a single message already takes 150 tokens? The function still keeps the newest message — it won’t drop the latest turn even if it blows past the budget.

typescript

{
  type: 'exercise',
  id: 'sliding-window-ex',
  title: 'Exercise 1: Build a Windowed Chatbot',
  difficulty: 'intermediate',
  exerciseType: 'write',
  instructions: 'Create a chatbot node that trims messages to the last 300 tokens using `trim_messages`. Use `count_tokens_approximately` as the token counter and set `start_on="human"`. The function should take a `MessagesState` and return a dict with the model response.',
  starterCode: 'def windowed_chatbot(state: MessagesState):\n    # Trim messages to last 300 tokens\n    trimmed = trim_messages(\n        state["messages"],\n        # Add the missing parameters here\n    )\n    response = model.invoke(trimmed)\n    return {"messages": [response]}',
  testCases: [
    { id: 'tc1', input: 'print(windowed_chatbot.__name__)', expectedOutput: 'windowed_chatbot', description: 'Function should be named windowed_chatbot' },
    { id: 'tc2', input: 'import inspect; sig = inspect.signature(windowed_chatbot); print("state" in str(sig))', expectedOutput: 'True', description: 'Function takes state parameter' },
  ],
  hints: [
    'You need three keyword arguments: strategy="last", token_counter=count_tokens_approximately, max_tokens=300',
    'Add start_on="human" to ensure trimmed messages start with a human message: trim_messages(state["messages"], strategy="last", token_counter=count_tokens_approximately, max_tokens=300, start_on="human")',
  ],
  solution: 'def windowed_chatbot(state: MessagesState):\n    trimmed = trim_messages(\n        state["messages"],\n        strategy="last",\n        token_counter=count_tokens_approximately,\n        max_tokens=300,\n        start_on="human",\n    )\n    response = model.invoke(trimmed)\n    return {"messages": [response]}',
  solutionExplanation: 'The trim_messages function takes the full message list and returns only the most recent messages that fit within 300 tokens. Setting start_on="human" ensures the trimmed result starts with a human message, which prevents confusing the LLM with an orphaned AI response.',
  xpReward: 15,
}

How Does Summary Memory Work?

A sliding window is simple, but it’s a blunt tool. Once messages fall off the edge, that knowledge is lost forever. Summary memory offers a smarter path: before dropping old messages, you ask the LLM to pack them into a tight recap. The recap replaces the old turns — important facts survive, but the token bill shrinks.

To make this work, you define a custom state that carries both the messages and a running summary.

python

class SummaryState(TypedDict):
    messages: Annotated[list, add_messages]
    summary: str

Inside the chatbot node, the logic goes like this: grab the current summary, feed it in as a system message, and trim the history down to just the newest turns. So the model receives two layers of context: a dense summary of everything that came before, plus the last few messages in full detail.

python

def chatbot_with_summary(state: SummaryState):
    messages = state["messages"]
    summary = state.get("summary", "")

    if summary:
        system_msg = SystemMessage(
            content=f"Conversation summary: {summary}"
        )
        recent = trim_messages(
            messages,
            strategy="last",
            token_counter=count_tokens_approximately,
            max_tokens=300,
            start_on="human",
        )
        full_context = [system_msg] + recent
    else:
        full_context = messages

    response = model.invoke(full_context)
    return {"messages": [response]}

So who creates the summary? A dedicated node handles that job. It fires only when messages pile up past a threshold you set. The node asks the LLM to weave new info into the old summary, then clears old messages from state with RemoveMessage. We keep the last 2 messages around so the chat still flows naturally.

python

def should_summarize(state: SummaryState):
    """Route to summarizer when messages pile up."""
    if len(state["messages"]) > 10:
        return "summarize"
    return END

def summarize_conversation(state: SummaryState):
    messages = state["messages"]
    existing = state.get("summary", "")

    if existing:
        prompt = (
            f"Current summary:\n{existing}\n\n"
            "Extend this summary with the new messages. "
            "Capture key facts, decisions, preferences."
        )
    else:
        prompt = (
            "Summarize the conversation so far. "
            "Capture key facts, decisions, preferences."
        )

    summary_msgs = messages + [HumanMessage(content=prompt)]
    new_summary = model.invoke(summary_msgs)

    # Keep last 2 messages, remove the rest
    delete = [
        RemoveMessage(id=m.id) for m in messages[:-2]
    ]

    return {
        "summary": new_summary.content,
        "messages": delete,
    }

WARNING: RemoveMessage does not wipe messages from the checkpoint log. It pulls them out of the live state so the LLM never sees them. The checkpoint still tracks every message — you can always replay or dig through old turns via the history API.

Now for the wiring. The chatbot runs first, then a conditional edge decides whether to trigger the summary step.

python

graph_s = StateGraph(SummaryState)
graph_s.add_node("chatbot", chatbot_with_summary)
graph_s.add_node("summarize", summarize_conversation)

graph_s.add_edge(START, "chatbot")
graph_s.add_conditional_edges(
    "chatbot",
    should_summarize,
    {"summarize": "summarize", END: END},
)
graph_s.add_edge("summarize", END)

app_summary = graph_s.compile(checkpointer=InMemorySaver())

When the chat crosses 10 messages, the summary node runs by itself. Old turns vanish from state, replaced by a compact recap. The model gets the recap plus the 2 newest lines — rich context with a thin token footprint.

KEY INSIGHT: Summary memory trades one LLM call now for cheaper calls later. You spend a bit of compute to pack old messages, but every future call uses fewer tokens. For any chat that runs past 20 rounds, that deal almost always comes out ahead.

typescript

{
  type: 'exercise',
  id: 'summary-memory-ex',
  title: 'Exercise 2: Write the Summarization Trigger',
  difficulty: 'intermediate',
  exerciseType: 'write',
  instructions: 'Write a `should_summarize` function that checks if the message count exceeds a given threshold. If `len(state["messages"])` is greater than 6, return `"summarize"`. Otherwise, return `END`. The function takes a `SummaryState` dict.',
  starterCode: 'from langgraph.graph import END\n\ndef should_summarize(state: SummaryState):\n    # Check message count and return route\n    pass',
  testCases: [
    { id: 'tc1', input: 'print(should_summarize({"messages": [1,2,3,4,5,6,7], "summary": ""}))', expectedOutput: 'summarize', description: '7 messages should trigger summarize' },
    { id: 'tc2', input: 'print(should_summarize({"messages": [1,2,3], "summary": ""}))', expectedOutput: '__end__', description: '3 messages should not trigger summarize' },
  ],
  hints: [
    'Compare len(state["messages"]) to 6 using an if statement',
    'Full solution: if len(state["messages"]) > 6: return "summarize" else: return END',
  ],
  solution: 'def should_summarize(state: SummaryState):\n    if len(state["messages"]) > 6:\n        return "summarize"\n    return END',
  solutionExplanation: 'The function checks the message count against the threshold. When messages exceed the limit, it routes to the summarize node. Otherwise, the graph ends normally. You can adjust the threshold based on your token budget.',
  xpReward: 15,
}

How Does Long-Term Memory Work with the LangGraph Store?

Short-term memory is locked to a single thread. Start a fresh thread and the agent knows nothing. That works for throwaway chats, but what happens when a user drops in next Tuesday and says “use the same settings as last time”? If all you have is short-term memory, the agent stares back blankly.

That’s the gap long-term memory fills. It relies on the Store — a key-value database that exists outside the graph’s state entirely. You put facts in and get them out whenever you need them. The store sorts entries by namespaces (think folders) and keys (think file names). Because no single thread owns it, data carries over from session to session.

InMemoryStore is the version you use while building. For real traffic, you’d plug in PostgresStore or a MongoDB backend. The calling code doesn’t change.

Here’s the core pattern. You stash items as dicts under a namespace tuple and a string key.

python

store = InMemoryStore()

store.put(
    namespace=("users", "alex"),
    key="preference",
    value={"language": "Python", "framework": "FastAPI"},
)

item = store.get(
    namespace=("users", "alex"), key="preference"
)
print(item.value)

python

{'language': 'Python', 'framework': 'FastAPI'}

You can also search a namespace to grab every fact stored for a given user.

python

memories = store.search(namespace=("users", "alex"))
for memory in memories:
    print(f"{memory.key}: {memory.value}")

python

preference: {'language': 'Python', 'framework': 'FastAPI'}

NOTE: InMemoryStore disappears the moment your Python process dies. For anything real, use PostgresStore from langgraph-checkpoint-postgres or a Redis/MongoDB backend. Your code stays the same — swap the class and go.

The real magic happens when you connect the Store to a running graph. LangGraph hands the store to your node functions by itself — as long as you compile with a store argument. Nodes pick it up through a keyword-only store parameter.

Below is a chatbot that checks the store for user prefs before writing a reply. The user_id comes from the config — in a production setup, your auth layer would set this.

python

def chatbot_with_memory(
    state: MessagesState,
    config: RunnableConfig,
    *,
    store: BaseStore,
):
    user_id = config["configurable"]["user_id"]

    memories = store.search(namespace=("users", user_id))
    memory_text = "\n".join(
        f"- {m.key}: {m.value}" for m in memories
    )

    system_msg = SystemMessage(
        content=(
            "You are a helpful assistant. "
            "User info:\n" + memory_text + "\n"
            "If the user shares new preferences, "
            "note them as [MEMORY: key=value]."
        )
    )

    response = model.invoke([system_msg] + state["messages"])
    return {"messages": [response]}

But who actually writes the memories? A second node takes care of that. It scans the agent’s reply for [MEMORY: key=value] markers and pushes each one into the store.

python

def save_memories(
    state: MessagesState,
    config: RunnableConfig,
    *,
    store: BaseStore,
):
    user_id = config["configurable"]["user_id"]
    last_message = state["messages"][-1].content

    pattern = r"\[MEMORY:\s*(\w+)=(.+?)\]"
    matches = re.findall(pattern, last_message)

    for key, value in matches:
        store.put(
            namespace=("users", user_id),
            key=key,
            value={"fact": value.strip()},
        )

    return {"messages": []}

Now compile the graph and pass in both a checkpointer (for the live chat) and the store (for lasting facts).

python

graph_lt = StateGraph(MessagesState)
graph_lt.add_node("chatbot", chatbot_with_memory)
graph_lt.add_node("save_memories", save_memories)

graph_lt.add_edge(START, "chatbot")
graph_lt.add_edge("chatbot", "save_memories")
graph_lt.add_edge("save_memories", END)

app_longterm = graph_lt.compile(
    checkpointer=InMemorySaver(),
    store=store,
)

Time to prove it works across sessions. In session 1, the user shares a couple of prefs. In session 2 (brand-new thread, same user), the agent should bring those prefs back.

python

config1 = {
    "configurable": {
        "thread_id": "session-1",
        "user_id": "alex",
    }
}
r1 = app_longterm.invoke(
    {"messages": [HumanMessage(
        content="I prefer dark mode and Python 3.12."
    )]},
    config=config1,
)
print(r1["messages"][-1].content)

The agent replies, weaving [MEMORY: ...] tags into its response. The save_memories node catches those tags and writes each fact to the Store.

python

config2 = {
    "configurable": {
        "thread_id": "session-2",
        "user_id": "alex",
    }
}
r2 = app_longterm.invoke(
    {"messages": [HumanMessage(
        content="What do you remember about me?"
    )]},
    config=config2,
)
print(r2["messages"][-1].content)

The agent recalls everything from session 1, even though this is a completely separate thread. That’s the Store doing its job.

TIP: Stick with tidy, fixed keys — not free-form text. Save facts under names like language_preference, framework, timezone. Clean keys make lookups easy and your code far simpler to debug.

typescript

{
  type: 'exercise',
  id: 'long-term-memory-ex',
  title: 'Exercise 3: Store and Retrieve a User Preference',
  difficulty: 'intermediate',
  exerciseType: 'write',
  instructions: 'Using an `InMemoryStore`, store a memory with namespace `("users", "bob")`, key `"editor"`, and value `{"tool": "VS Code"}`. Then retrieve it using `store.get()` and print the value dict.',
  starterCode: 'from langgraph.store.memory import InMemoryStore\n\nstore = InMemoryStore()\n\n# Store the memory\n# YOUR CODE HERE\n\n# Retrieve and print it\n# YOUR CODE HERE',
  testCases: [
    { id: 'tc1', input: 'item = store.get(namespace=("users", "bob"), key="editor"); print(item.value)', expectedOutput: "{'tool': 'VS Code'}", description: 'Should retrieve the stored preference' },
    { id: 'tc2', input: 'items = store.search(namespace=("users", "bob")); print(len(list(items)))', expectedOutput: '1', description: 'Should have exactly one memory' },
  ],
  hints: [
    'Use store.put(namespace=(...), key="...", value={...}) to store the memory',
    'Full store call: store.put(namespace=("users", "bob"), key="editor", value={"tool": "VS Code"}). Then: item = store.get(namespace=("users", "bob"), key="editor") and print(item.value)',
  ],
  solution: 'from langgraph.store.memory import InMemoryStore\n\nstore = InMemoryStore()\nstore.put(namespace=("users", "bob"), key="editor", value={"tool": "VS Code"})\n\nitem = store.get(namespace=("users", "bob"), key="editor")\nprint(item.value)',
  solutionExplanation: 'The put method stores a dict under a namespace and key. The get method retrieves it by the same namespace and key. Namespaces are tuples of strings that act like folder paths, scoping memories to specific users or contexts.',
  xpReward: 15,
}

How Do the Memory Types Stack Up?

Feature	Short-Term (Checkpointer)	Sliding Window	Summary Memory	Long-Term (Store)
Scope	Single thread	Single thread	Single thread	Cross-thread
Persistence	Until thread ends	Until thread ends	Until thread ends	Indefinite
Token cost	Grows with chat	Fixed ceiling	Fixed ceiling	Minimal (on demand)
Info loss	None	Old turns dropped	Packed down, some detail gone	None
Best for	Short chats	Support Q&A	Long brainstorms	User profiles, prefs
Setup work	One line (checkpointer)	Add `trim_messages`	Custom state + summary node	Store + read/write nodes

KEY INSIGHT: Most real agents blend at least two types. Short-term for the live thread, long-term for user context across sessions. Summary memory steps in only when chats often run past 20 rounds.

How Do You Layer All Three in a Single Agent?

In the real world, agents don’t rely on just one memory type. They stack them. Here’s the combo I’d pick for most production bots: the Store for lasting user facts, summary memory to pack down old turns, and a sliding window as the last line of defense before the LLM call.

The state carries both the message log and a summary string. Inside the node, you pull long-term facts from the Store, feed in the running summary as context, and trim the message list to the most recent turns.

python

def production_chatbot(
    state: SummaryState,
    config: RunnableConfig,
    *,
    store: BaseStore,
):
    user_id = config["configurable"]["user_id"]
    summary = state.get("summary", "")

    # 1. Load long-term memories
    memories = store.search(namespace=("users", user_id))
    mem_text = "\n".join(
        f"- {m.key}: {m.value}" for m in memories
    )

    # 2. Build system context
    parts = ["You are a helpful assistant."]
    if mem_text:
        parts.append(f"User profile:\n{mem_text}")
    if summary:
        parts.append(f"Conversation so far:\n{summary}")
    system_msg = SystemMessage(content="\n\n".join(parts))

    # 3. Trim recent messages
    recent = trim_messages(
        state["messages"],
        strategy="last",
        token_counter=count_tokens_approximately,
        max_tokens=1000,
        start_on="human",
    )

    response = model.invoke([system_msg] + recent)
    return {"messages": [response]}

This single function draws from all three wells: the Store (lasting facts), the summary (packed history), and the trimmed message log (fresh context). The LLM sees the whole picture with none of the token bloat.

UNDER-THE-HOOD: The order of what you feed the model makes a difference. System message goes first, then the recent turns. The LLM reads the system message as stable background and the recent turns as the active chat. Placing the summary in the system message (not as a user message) keeps the model from trying to “answer” the summary.

When Should You Use Which Memory Type?

The choice isn’t about what sounds fancy — it’s about what your users actually run into.

Short-term memory alone is enough when chats wrap up in a few turns. A Q&A bot that handles 3-5 exchanges? Plug in a checkpointer and call it done.

Add a sliding window when chats could stretch long but the early bits lose relevance fast. Customer support is the classic case. The fix almost always hangs on the last handful of messages, not the greeting from twenty minutes ago.

Bring in summary memory when old context still shapes the outcome, but you can’t afford to keep every word. Picture a planning session where a design choice from an hour ago still matters — but the exact phrasing doesn’t.

Reach for long-term memory when users return across sessions. Personal assistants, tutoring platforms, and enterprise copilots all need this. The moment a user says “use the same format as last time” and your agent has no answer, you know it’s time for a Store.

WARNING: Don’t pile on memory you don’t need. Each layer brings more moving parts — extra state, extra nodes, extra places to break. Start with the simplest option that does the job. Layer up only when your users hit the limits.

What Mistakes Should You Watch Out For?

Mistake 1: Skipping the checkpointer

Wrong:

python

app = graph.compile()  # No checkpointer!

The problem: Nothing survives between calls. Each invoke() starts with a blank message list. Your bot has total amnesia.

Fix:

python

app = graph.compile(checkpointer=InMemorySaver())

Mistake 2: Dumping the full message log into long chats

Wrong:

python

def chatbot(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": [response]}

The problem: After 50 rounds, state["messages"] could pack thousands of tokens. You’ll blow the context window, and your API bill goes through the roof.

Fix:

python

def chatbot(state: MessagesState):
    trimmed = trim_messages(
        state["messages"],
        strategy="last",
        token_counter=count_tokens_approximately,
        max_tokens=2000,
    )
    response = model.invoke(trimmed)
    return {"messages": [response]}

Wrong:

python

config = {"configurable": {"thread_id": "main"}}
# Alice and Bob both use "main" — they see each other's messages!

The problem: Thread IDs wall off short-term memory. Same ID = shared log. That’s a privacy hole.

Fix:

python

config = {"configurable": {"thread_id": f"user-{user_id}"}}

Mistake 4: Confusing Store namespaces with thread IDs

The Store and the checkpointer are totally separate. Thread IDs fence off short-term memory (messages). Namespaces fence off long-term memory (facts).

python

# Short-term: scoped by thread_id
config = {"configurable": {"thread_id": "session-42"}}

# Long-term: scoped by namespace
store.put(namespace=("users", "alex"), key="pref", value={...})

A thread ID rotates with each session. A namespace stays put for the same user across every session. Confuse the two and your data lands in the wrong place.

Predict the output: You have a graph with summary memory. The summary says “User is building a FastAPI app.” The last 2 messages talk about database choices. The user asks “What framework am I using?” Will the agent know?

Yes — because the summary gets sent as a system message. Even though the framework talk happened many turns ago, the summary held onto that fact.

Complete Code

Click to expand the full script (copy-paste and run)

python

# Complete code from: Memory Systems in LangGraph
# Requires: pip install langgraph langchain-openai langchain-core
# Python 3.10+

import os
import re
from typing import Annotated, TypedDict

from langchain_openai import ChatOpenAI
from langchain_core.messages import (
    HumanMessage,
    AIMessage,
    SystemMessage,
    RemoveMessage,
)
from langchain_core.messages.utils import (
    trim_messages,
    count_tokens_approximately,
)
from langchain_core.runnables import RunnableConfig
from langgraph.graph import (
    StateGraph,
    MessagesState,
    START,
    END,
    add_messages,
)
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.store.memory import InMemoryStore
from langgraph.store.base import BaseStore

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# --- 1. Basic short-term memory ---

def chatbot_basic(state: MessagesState):
    response = model.invoke(state["messages"])
    return {"messages": [response]}

graph_basic = StateGraph(MessagesState)
graph_basic.add_node("chatbot", chatbot_basic)
graph_basic.add_edge(START, "chatbot")
graph_basic.add_edge("chatbot", END)
app_basic = graph_basic.compile(checkpointer=InMemorySaver())

config_basic = {"configurable": {"thread_id": "demo-1"}}
r1 = app_basic.invoke(
    {"messages": [HumanMessage(content="My name is Alex.")]},
    config=config_basic,
)
print("Basic memory test:")
print(r1["messages"][-1].content)

r2 = app_basic.invoke(
    {"messages": [HumanMessage(content="What's my name?")]},
    config=config_basic,
)
print(r2["messages"][-1].content)

# --- 2. Sliding window memory ---

def chatbot_windowed(state: MessagesState):
    trimmed = trim_messages(
        state["messages"],
        strategy="last",
        token_counter=count_tokens_approximately,
        max_tokens=500,
        start_on="human",
    )
    response = model.invoke(trimmed)
    return {"messages": [response]}

graph_w = StateGraph(MessagesState)
graph_w.add_node("chatbot", chatbot_windowed)
graph_w.add_edge(START, "chatbot")
graph_w.add_edge("chatbot", END)
app_windowed = graph_w.compile(checkpointer=InMemorySaver())

# --- 3. Summary memory ---

class SummaryState(TypedDict):
    messages: Annotated[list, add_messages]
    summary: str

def chatbot_with_summary(state: SummaryState):
    messages = state["messages"]
    summary = state.get("summary", "")

    if summary:
        system_msg = SystemMessage(
            content=f"Conversation summary: {summary}"
        )
        recent = trim_messages(
            messages,
            strategy="last",
            token_counter=count_tokens_approximately,
            max_tokens=300,
            start_on="human",
        )
        full_context = [system_msg] + recent
    else:
        full_context = messages

    response = model.invoke(full_context)
    return {"messages": [response]}

def should_summarize(state: SummaryState):
    if len(state["messages"]) > 10:
        return "summarize"
    return END

def summarize_conversation(state: SummaryState):
    messages = state["messages"]
    existing = state.get("summary", "")

    if existing:
        prompt = (
            f"Current summary:\n{existing}\n\n"
            "Extend this summary with the new messages. "
            "Capture key facts, decisions, preferences."
        )
    else:
        prompt = (
            "Summarize the conversation so far. "
            "Capture key facts, decisions, preferences."
        )

    summary_msgs = messages + [HumanMessage(content=prompt)]
    new_summary = model.invoke(summary_msgs)

    delete = [
        RemoveMessage(id=m.id) for m in messages[:-2]
    ]

    return {
        "summary": new_summary.content,
        "messages": delete,
    }

graph_s = StateGraph(SummaryState)
graph_s.add_node("chatbot", chatbot_with_summary)
graph_s.add_node("summarize", summarize_conversation)
graph_s.add_edge(START, "chatbot")
graph_s.add_conditional_edges(
    "chatbot",
    should_summarize,
    {"summarize": "summarize", END: END},
)
graph_s.add_edge("summarize", END)
app_summary = graph_s.compile(checkpointer=InMemorySaver())

# --- 4. Long-term memory with Store ---

store = InMemoryStore()

def chatbot_with_memory(
    state: MessagesState,
    config: RunnableConfig,
    *,
    store: BaseStore,
):
    user_id = config["configurable"]["user_id"]
    memories = store.search(namespace=("users", user_id))
    memory_text = "\n".join(
        f"- {m.key}: {m.value}" for m in memories
    )

    system_msg = SystemMessage(
        content=(
            "You are a helpful assistant. "
            "User info:\n" + memory_text + "\n"
            "If the user shares new preferences, "
            "note them as [MEMORY: key=value]."
        )
    )

    response = model.invoke([system_msg] + state["messages"])
    return {"messages": [response]}

def save_memories(
    state: MessagesState,
    config: RunnableConfig,
    *,
    store: BaseStore,
):
    user_id = config["configurable"]["user_id"]
    last_message = state["messages"][-1].content

    pattern = r"\[MEMORY:\s*(\w+)=(.+?)\]"
    matches = re.findall(pattern, last_message)

    for key, value in matches:
        store.put(
            namespace=("users", user_id),
            key=key,
            value={"fact": value.strip()},
        )
    return {"messages": []}

graph_lt = StateGraph(MessagesState)
graph_lt.add_node("chatbot", chatbot_with_memory)
graph_lt.add_node("save_memories", save_memories)
graph_lt.add_edge(START, "chatbot")
graph_lt.add_edge("chatbot", "save_memories")
graph_lt.add_edge("save_memories", END)

app_longterm = graph_lt.compile(
    checkpointer=InMemorySaver(),
    store=store,
)

print("\nLong-term memory test:")
config_lt1 = {
    "configurable": {
        "thread_id": "session-1",
        "user_id": "alex",
    }
}
r3 = app_longterm.invoke(
    {"messages": [HumanMessage(
        content="I prefer dark mode and Python 3.12."
    )]},
    config=config_lt1,
)
print(r3["messages"][-1].content)

config_lt2 = {
    "configurable": {
        "thread_id": "session-2",
        "user_id": "alex",
    }
}
r4 = app_longterm.invoke(
    {"messages": [HumanMessage(
        content="What do you remember about me?"
    )]},
    config=config_lt2,
)
print(r4["messages"][-1].content)

print("\nScript completed successfully.")

Frequently Asked Questions

Does `InMemorySaver` survive a Python restart?

No. It lives in RAM. Kill the process and the data goes with it. If your data needs to last, reach for SqliteSaver, PostgresSaver, or any disk-backed saver. The API is identical — only the class name swaps out.

Can I use the Store without a checkpointer?

Absolutely. The Store and the checkpointer run as two independent pieces. Compile a graph with store=my_store and no checkpointer if you want. The agent won’t remember the live chat between calls, but it will still read and write lasting facts through the Store.

How do I remove facts from the Store?

Call store.delete(namespace=("users", "alex"), key="preference") to drop a single entry. To clear out everything for a user, scan the namespace first, then delete each result:

python

items = store.search(namespace=("users", "alex"))
for item in items:
    store.delete(namespace=("users", "alex"), key=item.key)

How is `trim_messages` different from `RemoveMessage`?

They do different jobs. trim_messages creates a shorter copy of the log — the actual state stays untouched. Reach for it when you need a one-off trim right before an LLM call. RemoveMessage, on the other hand, edits the graph’s state for real. Messages leave the checkpoint for good. Use it when you want a smaller stored state — typically right after running a summary.

Can I do semantic search on the Store?

Yes. InMemoryStore can handle vector lookups when you wire in an embedding function. You’d write store.search(namespace=(...), query="user's coding style") and get results ranked by meaning. Great when you’ve saved dozens of facts and only need the most relevant ones.

Summary

Memory is what turns a stateless LLM call into a real agent. LangGraph gives you three ways to get there:

Short-term memory — a checkpointer saves the message log within one thread. Takes one line to set up.
Sliding window / summary memory — trims or packs old messages to keep your token budget under control. Essential once chats run long.
Long-term memory — the Store keeps facts alive across threads and sessions, sorted by namespaces.

In practice, most agents blend short-term and long-term memory. Layer in summary memory when your chats regularly cross the 20-turn mark.

Practice exercise: Build a chatbot that layers all three types. It should store user prefs in a Store, summarize after 8 messages, and trim to 500 tokens before each LLM call. Test it across two sessions — the second should recall prefs from the first.

Solution

Combine the `production_chatbot` function from the “Combining Memory Types” section with `save_memories` and `summarize_conversation`. Your graph needs four nodes: chatbot, save_memories, a branch check for summarizing, and the summarize node. Use `SummaryState` as the state class. Compile with both `checkpointer=InMemorySaver()` and `store=InMemoryStore()`.

The key insight: the chatbot node reads from all three sources (Store, summary, trimmed messages), while save_memories writes to the Store, and the summarize node packs down the message log.

References

LangGraph documentation — Memory overview. Link
LangGraph documentation — How to add memory. Link
LangGraph documentation — Long-term memory with Store. Link
LangChain blog — Launching Long-Term Memory Support in LangGraph. Link
LangChain documentation — trim_messages utility. Link
LangGraph API Reference — InMemoryStore. Link
LangGraph documentation — RemoveMessage. Link
Harrison Chase — Long-Term Agentic Memory with LangGraph (DeepLearning.AI). Link

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

LangGraph Memory: Short-Term, Long-Term & Conversation

What Counts as Memory in LangGraph?

Prerequisites

How Does Short-Term Memory Work?

Why Can’t You Just Keep Every Message?

How Does the Sliding Window Approach Work?

How Does Summary Memory Work?

How Does Long-Term Memory Work with the LangGraph Store?

How Do the Memory Types Stack Up?

How Do You Layer All Three in a Single Agent?

When Should You Use Which Memory Type?

What Mistakes Should You Watch Out For?

Mistake 1: Skipping the checkpointer

Mistake 2: Dumping the full message log into long chats

Mistake 4: Confusing Store namespaces with thread IDs

Complete Code

Frequently Asked Questions

Does `InMemorySaver` survive a Python restart?

Can I use the Store without a checkpointer?

How do I remove facts from the Store?

How is `trim_messages` different from `RemoveMessage`?

Can I do semantic search on the Store?

Summary

References

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

What Counts as Memory in LangGraph?

Prerequisites

How Does Short-Term Memory Work?

Why Can’t You Just Keep Every Message?

How Does the Sliding Window Approach Work?

How Does Summary Memory Work?

How Does Long-Term Memory Work with the LangGraph Store?

How Do the Memory Types Stack Up?

How Do You Layer All Three in a Single Agent?

When Should You Use Which Memory Type?

What Mistakes Should You Watch Out For?

Mistake 1: Skipping the checkpointer

Mistake 2: Dumping the full message log into long chats

Mistake 3: Sharing one thread_id across users

Mistake 4: Confusing Store namespaces with thread IDs

Complete Code

Frequently Asked Questions

Does InMemorySaver survive a Python restart?

Can I use the Store without a checkpointer?

How do I remove facts from the Store?

How is trim_messages different from RemoveMessage?

Can I do semantic search on the Store?

Summary

References

Related Articles

Build a Python AI Chatbot with Memory Using LangChain

LangGraph Document Processing Agent: Multi-Modal

LangGraph Customer Support Agent with Escalation

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Does `InMemorySaver` survive a Python restart?

How is `trim_messages` different from `RemoveMessage`?