LangGraph Multi-Agent Research Assistant Project

Build a multi-agent research pipeline in LangGraph with planner, researcher, writer, and reviewer agents that turn any topic into a polished report.

Written by Selva Prabhakaran | 29 min read

Build a multi-agent research system where a planner, researcher, and writer collaborate to produce comprehensive reports from web sources.

You need a report on a new topic. You paste your question into ChatGPT and get a decent summary, but it lacks depth. It does not cross-check sources or lay out findings the way you would. So you do it the hard way — search, read, take notes, organize, write. That eats up hours. What if you could build a system where three agents handle the whole thing for you?

That is what we are building here. One agent plans the research by breaking your question into parts. A second agent searches the web for each part and pulls out key facts. A third agent takes all those facts and turns them into a polished report. And a fourth agent reviews the output and can send it back for another pass. The whole system runs on LangGraph, with conditional routing to manage the handoffs.

Let me walk you through how data moves through the system before we touch any code.

You give it a topic — say, “What are the latest advances in protein folding prediction?” The planner reads this and splits it into 3 to 5 focused sub-questions. Each one targets a different angle: recent breakthroughs, key methods, leading teams, real-world uses.

Those sub-questions go to the researcher. For each one, it searches the web, gathers what it finds, and writes up a short summary with source links. By the end, you have a set of sourced findings that cover the topic from every side.

The writer takes all those findings and shapes them into a report. It groups them by theme, writes section summaries, adds a top-level overview, and lists every source. The result is a clean document you could hand to your team.

Then the reviewer steps in. It reads the report and checks it against the original sub-questions. If the coverage is solid, it gives the green light. If something is missing, it sends the work back to the researcher for another round. A counter stops it from looping forever.

We will build each piece, connect them in a graph, and run the full pipeline on a real research question.

Why Is This a Multi-Agent Problem?

Could one agent do all of this? In theory, yes. In practice, it would struggle.

A single agent with a prompt like “plan research, search the web, and write a report” runs into three problems right away. Its context window fills up with tool descriptions it does not need at each stage. Its prompt tries to juggle three roles at once, which leads to confused output. And when something breaks, you cannot tell which step went wrong.

Multi-agent systems solve this by giving each agent a single job:

A focused prompt. The planner only thinks about breaking down questions. The researcher only thinks about finding facts. The writer only thinks about clear prose.
Only the tools it needs. The planner needs no tools at all — it just reasons. The researcher gets a search tool. The writer works with the gathered facts.
Its own slice of the state. No mix-ups between stages.

Approach	Prompt Complexity	Tool Clarity	Debuggability
Single agent	High — one prompt covers all roles	Low — all tools visible always	Hard — can’t isolate failures
Multi-agent	Low — each prompt is focused	High — each agent sees only its tools	Easy — trace shows which agent failed

Key Insight: Multi-agent setups do not make any single LLM smarter. They make the whole system smarter by giving each agent a narrow, clear task — the same idea behind any good human team.

Prerequisites

Python version: 3.10+
Required libraries: langgraph (0.4+), langchain-openai (0.3+), langchain-core (0.3+), langchain-community (0.3+), tavily-python (0.5+)
Install: pip install langgraph langchain-openai langchain-core langchain-community tavily-python
API keys: An OpenAI API key (OPENAI_API_KEY) and a Tavily API key (TAVILY_API_KEY). See OpenAI’s docs and Tavily’s docs to create them.
Time to complete: ~45 minutes
Prior knowledge: Basic LangGraph concepts (nodes, edges, state) from earlier posts in this series.

How Do You Set Up the Project?

Let’s start with imports. We need ChatOpenAI to talk to the LLM, TavilySearchResults for web search, and StateGraph from LangGraph to build our agent workflow. We also bring in TypedDict, which lets us define a typed shared state — the single data object that all four agents will read from and write to.

python

import os
import json
from typing import Annotated, TypedDict
from datetime import datetime

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import create_react_agent

os.environ["OPENAI_API_KEY"] = "your-openai-key-here"
os.environ["TAVILY_API_KEY"] = "your-tavily-key-here"

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

Warning: Never put API keys directly in your code for production. Use a `.env` file with `python-dotenv` instead. The keys above are just placeholders — swap them for your own before you run anything.

How Do You Design the Shared State?

In LangGraph, every graph revolves around a state object. Picture it as a shared notepad pinned to the wall. Any agent can read it, and any agent can write to it. For our project, this notepad tracks the user’s question, the sub-questions the planner comes up with, every finding the researcher digs up, and the final report.

One field worth noting is research_findings. It is a growing list — every time the researcher finishes a sub-question, it adds a new dict with the question text, an answer, and a list of source URLs. Another key field is current_step, which acts like a traffic light telling the router which agent goes next.

python

class ResearchState(TypedDict):
    messages: Annotated[list, add_messages]
    research_query: str
    sub_questions: list[str]
    research_findings: list[dict]
    final_report: str
    current_step: str
    review_count: int

That is seven fields total. The messages field uses LangGraph’s add_messages reducer, which means new messages get added to the list rather than wiping out old ones. The review_count field is a safety cap that stops the reviewer from going around in circles — we will get into that later.

You might wonder: why not just use MessagesState and keep everything in a flat list? Because a flat list forces every agent to dig through raw text to find what it needs. With named, typed fields, the planner drops its output into sub_questions, the researcher fills research_findings, and the writer reads both directly. No parsing, no guessing.

Key Insight: Your state schema is the contract between agents. Typed fields let agents share data cleanly without anyone having to parse raw chat messages.

How Does the Planner Agent Work?

The planner is the first agent in the chain. Its job is simple but vital: take a big, broad research question and carve it into 3 to 5 sharp sub-questions. The quality of these sub-questions shapes the entire report. Vague sub-questions lead to vague findings. Tight ones lead to solid, useful research.

This agent needs zero tools. All it does is reason about how to split the topic. We define it as a regular Python function that returns a dict of state updates. LangGraph takes that dict and folds it into the existing state for us.

python

def planner_agent(state: ResearchState) -> dict:
    """Break the research query into focused sub-questions."""
    query = state["research_query"]

    planner_prompt = f"""You are a research planner. Break down
a research question into 3-5 focused sub-questions that together
provide comprehensive coverage of the topic.

Research question: {query}

Return ONLY a JSON list of strings. Each string is one sub-question.
Example: ["What is X?", "How does X compare to Y?"]
"""

    response = model.invoke([HumanMessage(content=planner_prompt)])
    sub_questions = json.loads(response.content)

    return {
        "sub_questions": sub_questions,
        "current_step": "research",
        "messages": [AIMessage(
            content=f"Created {len(sub_questions)} sub-questions: {sub_questions}",
            name="planner"
        )]
    }

Pay attention to the return value. The function hands back a dict with only the fields it wants to update. It does not touch research_findings or final_report because those are not its concern.

[UNDER THE HOOD]
Why return a dict and not the full state? LangGraph works by merging. If an agent hands back {"sub_questions": [...], "current_step": "research"}, only those two fields get changed. The rest of the state stays exactly as it was. This is a built-in safety net — agents cannot wipe out each other’s data by accident.

Before you run this, try to guess: If the topic is “What are the benefits of meditation?”, what sub-questions do you think the planner would create? You would likely see one about physical health, one about mental health, one about the science behind it, and one about how to start a practice.

How Does the Researcher Agent Work?

This agent does the real legwork. It takes the list of sub-questions from the planner, fires off a web search for each one using Tavily, and packages what it finds into a neat dict: the question, a summary answer, and a list of source URLs.

We cap Tavily at 3 results per query. Going higher just adds noise without making the summary much better. The researcher walks through sub-questions one by one, grabs search results, and asks the LLM to boil them down.

python

search_tool = TavilySearchResults(max_results=3)

def researcher_agent(state: ResearchState) -> dict:
    """Research each sub-question using web search."""
    sub_questions = state["sub_questions"]
    findings = []
    errors = []

    for question in sub_questions:
        try:
            search_results = search_tool.invoke({"query": question})
            if not search_results:
                errors.append(f"No results for: {question}")
                continue

            context = "\n".join([
                f"Source: {r['url']}\nContent: {r['content']}"
                for r in search_results
            ])

            research_prompt = f"""Answer this question based on search results.
Return valid JSON: {{"question": "...", "answer": "...", "sources": [...]}}

Question: {question}
Search Results:
{context}"""

            response = model.invoke([HumanMessage(content=research_prompt)])
            finding = json.loads(response.content)
            findings.append(finding)

        except json.JSONDecodeError:
            errors.append(f"JSON parse error for: {question}")
        except Exception as e:
            errors.append(f"Error researching '{question}': {str(e)}")

    error_msg = f" Errors: {errors}" if errors else ""
    return {
        "research_findings": findings,
        "current_step": "writing",
        "messages": [AIMessage(
            content=f"Researched {len(findings)}/{len(sub_questions)}.{error_msg}",
            name="researcher"
        )]
    }

Notice the error handling. If one search bombs, the rest still run to the end. The try-except block catches both API hiccups and JSON parsing failures. Any errors get logged in the agent’s message so you can trace them later.

Tip: Pick `max_results` based on what you need. For broad, open-ended research, use 5 to 7 results. For checking a specific claim, 2 or 3 is enough. More results means higher API costs and slower runs.

Warning: Always check LLM JSON output before you use it. The `json.loads()` call will fail if the model wraps JSON in markdown fences or adds extra text. For production, use LangChain’s `JsonOutputParser` or `PydanticOutputParser` for clean, typed output every time.

Exercise 1: Add a Relevance Filter to the Researcher

Right now, the researcher uses every search result no matter the quality. Add a scoring step that drops low-value results before the summary.

python

type: 'exercise'
id: 'relevance-filter'
title: 'Exercise 1: Add a Relevance Filter'
difficulty: 'advanced'
exerciseType: 'write'
instructions: |
  Modify the researcher to score each search result's relevance (0-10)
  before including it in the context. Only include results scoring 7+.
  If no results pass the threshold, include the top result anyway.
starterCode: |
  def filter_relevant_results(question, search_results, model):
      """Score and filter search results by relevance."""
      filtered = []
      for result in search_results:
          # TODO: Ask the model to score relevance 0-10
          # TODO: Include only results scoring 7+
          pass

      # If nothing passed, keep the best result
      if not filtered and search_results:
          filtered = [search_results[0]]
      return filtered
testCases:
  - id: 'tc1'
    input: 'print(len(filter_relevant_results("test", [{"content": "relevant"}, {"content": "irrelevant"}], model)))'
    expectedOutput: '# Output depends on model scoring'
    description: 'Should return filtered results'
  - id: 'tc2'
    input: 'print(type(filter_relevant_results("test", [], model)))'
    expectedOutput: "<class 'list'>"
    description: 'Should return a list even with empty input'
hints:
  - 'Ask the model: "Score the relevance of this content to the question on a scale of 0-10. Return only the number."'
  - 'Parse the score with int(response.content.strip()), then check if score >= 7'
solution: |
  def filter_relevant_results(question, search_results, model):
      filtered = []
      for result in search_results:
          score_prompt = f"Score 0-10 how relevant this is to '{question}': {result['content'][:200]}. Return ONLY the number."
          response = model.invoke([HumanMessage(content=score_prompt)])
          try:
              score = int(response.content.strip())
              if score >= 7:
                  filtered.append(result)
          except ValueError:
              continue
      if not filtered and search_results:
          filtered = [search_results[0]]
      return filtered
solutionExplanation: |
  The function asks the LLM to score each result individually.
  Results below 7/10 are dropped. If nothing passes, we keep
  the best available result to avoid empty findings.
xpReward: 20

How Does the Writer Agent Work?

Once the researcher is done, the writer takes the stage. It reads every finding and stitches them into a single, polished report. No searching, no planning — just organizing and presenting. I deliberately keep this agent tool-free. Its entire purpose is turning raw data into something a human wants to read.

The writer’s prompt is the longest of the four agents, and for good reason: format matters. A useful report needs an overview paragraph up front, sections grouped by theme (not by question), and a full source list at the bottom. Grouping by theme is a small detail that makes a big difference in how the report reads.

python

def writer_agent(state: ResearchState) -> dict:
    """Synthesize research findings into a structured report."""
    query = state["research_query"]
    findings = state.get("research_findings", [])

    if not findings:
        return {
            "final_report": "No research findings available.",
            "current_step": "complete",
            "messages": [AIMessage(content="No findings to write.", name="writer")]
        }

    findings_text = "\n\n".join([
        f"Q: {f['question']}\nA: {f['answer']}\nSources: {', '.join(f['sources'])}"
        for f in findings
    ])

    writer_prompt = f"""Synthesize these findings into a research report.

Original question: {query}

Research findings:
{findings_text}

Structure the report as:
1. Executive Summary (2-3 sentences)
2. Key Findings (organized by theme, not by question)
3. Detailed Analysis (expand on each finding with source citations)
4. Sources (list all unique URLs)

Write clearly. Use short paragraphs. Cite sources inline."""

    response = model.invoke([HumanMessage(content=writer_prompt)])

    return {
        "final_report": response.content,
        "current_step": "review",
        "messages": [AIMessage(content="Report draft completed.", name="writer")]
    }

See how the writer grabs research_findings right out of the state? No need to parse messages. Also notice that it sets current_step to “review” rather than “complete.” The draft is not final until the reviewer says so.

The if not findings check at the top is a guard rail. If the researcher came back empty on every sub-question, there is nothing to write about. Without that check, the writer would either crash or produce gibberish.

How Does the Quality Reviewer Work?

Now things get interesting. A simple pipeline — plan, then research, then write — gets the job done, but it has a blind spot. What if the researcher skipped a key angle? What if the report jumps around and reads poorly?

That is where the reviewer comes in. It reads the draft report, compares it to the original sub-questions, and makes a judgment: good enough, or needs more work. If the verdict is “revise,” the graph loops the researcher back for another pass. The feedback travels through the message history so the researcher knows what to focus on. This loop is what turns a pipeline into a quality system.

python

def reviewer_agent(state: ResearchState) -> dict:
    """Review the report quality and decide next steps."""
    if state.get("review_count", 0) >= 2:
        return {
            "current_step": "complete",
            "messages": [AIMessage(
                content="Max reviews reached. Approving report.",
                name="reviewer"
            )]
        }

    report = state["final_report"]
    sub_questions = state["sub_questions"]

    review_prompt = f"""You are a research quality reviewer.
Review this report against the original sub-questions.

Sub-questions: {json.dumps(sub_questions)}

Report:
{report}

Rate as APPROVED or NEEDS_REVISION.
If NEEDS_REVISION, say what's missing in 1-2 sentences.
Return JSON: {{"verdict": "APPROVED" or "NEEDS_REVISION", "feedback": "..."}}
"""

    response = model.invoke([HumanMessage(content=review_prompt)])
    review = json.loads(response.content)

    if review["verdict"] == "APPROVED":
        return {
            "current_step": "complete",
            "messages": [AIMessage(content="Report APPROVED.", name="reviewer")]
        }
    else:
        return {
            "current_step": "research",
            "review_count": state.get("review_count", 0) + 1,
            "messages": [AIMessage(
                content=f"Revision needed: {review['feedback']}",
                name="reviewer"
            )]
        }

See the review_count check at the very top? That is your escape hatch. A fussy reviewer could send work back over and over, eating up API credits each time. Capping it at two rounds keeps costs under control. After two revisions, take what you have and move on.

Tip: A review count cap is a must for any graph with cycles. LangGraph also has a built-in `recursion_limit` (25 steps by default), but that is a last resort. Your own counter gives you tighter control and clearer error messages.

Exercise 2: Build a Scoring Reviewer

The reviewer above gives a simple yes-or-no verdict. Build a version that scores the report on three axes and only approves if every score clears a bar.

python

type: 'exercise'
id: 'scoring-reviewer'
title: 'Exercise 2: Multi-Dimension Review Scoring'
difficulty: 'advanced'
exerciseType: 'write'
instructions: |
  Modify the reviewer to score the report on three dimensions:
  coverage (0-10), depth (0-10), and coherence (0-10).
  Approve only if ALL scores are 7+. Return the scores in the state.
starterCode: |
  def scoring_reviewer(state: ResearchState) -> dict:
      report = state["final_report"]
      sub_questions = state["sub_questions"]

      review_prompt = f"""Score this report on 3 dimensions (0-10 each):
  - coverage: does it address all sub-questions?
  - depth: are answers substantive or shallow?
  - coherence: does it flow logically?

  Sub-questions: {json.dumps(sub_questions)}
  Report: {report}

  Return JSON: {{"coverage": N, "depth": N, "coherence": N}}"""

      response = model.invoke([HumanMessage(content=review_prompt)])
      scores = json.loads(response.content)

      # TODO: Check if all scores >= 7
      # TODO: Return appropriate current_step
      pass
testCases:
  - id: 'tc1'
    input: 'print("approved" if all(v >= 7 for v in {"coverage": 8, "depth": 9, "coherence": 7}.values()) else "revision")'
    expectedOutput: 'approved'
    description: 'All scores 7+ should approve'
  - id: 'tc2'
    input: 'print("approved" if all(v >= 7 for v in {"coverage": 8, "depth": 5, "coherence": 7}.values()) else "revision")'
    expectedOutput: 'revision'
    description: 'Any score below 7 should trigger revision'
hints:
  - 'Use all(score >= 7 for score in scores.values()) to check the threshold'
  - 'If approved, set current_step to "complete". If not, set it to "research" and increment review_count'
solution: |
  def scoring_reviewer(state: ResearchState) -> dict:
      if state.get("review_count", 0) >= 2:
          return {"current_step": "complete",
                  "messages": [AIMessage(content="Max reviews.", name="reviewer")]}

      report = state["final_report"]
      sub_questions = state["sub_questions"]
      review_prompt = f"""Score this report (0-10 each):
      coverage, depth, coherence.
      Sub-questions: {json.dumps(sub_questions)}
      Report: {report}
      Return JSON: {{"coverage": N, "depth": N, "coherence": N}}"""

      response = model.invoke([HumanMessage(content=review_prompt)])
      scores = json.loads(response.content)
      approved = all(v >= 7 for v in scores.values())

      if approved:
          return {"current_step": "complete",
                  "messages": [AIMessage(content=f"APPROVED. Scores: {scores}", name="reviewer")]}
      else:
          return {"current_step": "research",
                  "review_count": state.get("review_count", 0) + 1,
                  "messages": [AIMessage(content=f"Scores: {scores}. Needs revision.", name="reviewer")]}
solutionExplanation: |
  The scoring reviewer uses structured scores instead of a binary verdict.
  The all() check ensures every dimension meets the 7/10 threshold.
  The review_count safeguard still prevents infinite loops.
xpReward: 20

How Do You Wire the Full Graph?

Now we connect the dots. Each agent becomes a node in the graph, and conditional edges decide who runs next. The route_step function is the traffic cop — it reads current_step from the state and picks the matching node.

python

def route_step(state: ResearchState) -> str:
    """Route to the next agent based on current step."""
    step = state.get("current_step", "planning")
    routes = {
        "planning": "planner",
        "research": "researcher",
        "writing": "writer",
        "review": "reviewer",
        "complete": END,
    }
    return routes.get(step, END)

workflow = StateGraph(ResearchState)

workflow.add_node("planner", planner_agent)
workflow.add_node("researcher", researcher_agent)
workflow.add_node("writer", writer_agent)
workflow.add_node("reviewer", reviewer_agent)

workflow.add_conditional_edges(START, route_step)
workflow.add_conditional_edges("planner", route_step)
workflow.add_conditional_edges("researcher", route_step)
workflow.add_conditional_edges("writer", route_step)
workflow.add_conditional_edges("reviewer", route_step)

graph = workflow.compile()

Here is the path the data takes: START goes to the planner, then to the researcher, then to the writer, then to the reviewer. From there, the graph either ends or sends the work back to the researcher. Each agent writes its “next stop” recommendation into current_step, and route_step follows it.

You might wonder: why not use fixed edges for the straight parts? Because this flexible pattern makes it easy to grow. Want to add a fact-checker between the researcher and writer later? Just drop in a new node and add one line to the routing dict. Nothing else changes.

Key Insight: When routing is driven by a state field, your graph becomes easy to extend. Each agent says where the work should go next, and the router obeys. You never have to untangle hard-wired edges.

Here is what the graph looks like in a diagram:

python

START --> planner --> researcher --> writer --> reviewer --+--> END
                       ^                                  |
                       +---------- (revision) -----------+

How Do You Run the Full Pipeline?

Let’s put the whole system to the test. We set up a starting state and call graph.invoke(). Because current_step begins at “planning”, the router hands control to the planner right away.

python

result = graph.invoke({
    "messages": [HumanMessage(content="Start research")],
    "research_query": "What are the latest advances in protein folding prediction?",
    "sub_questions": [],
    "research_findings": [],
    "final_report": "",
    "current_step": "planning",
    "review_count": 0,
})

When the graph wraps up, you can inspect each part of the output separately.

python

print("=== Sub-Questions ===")
for i, q in enumerate(result["sub_questions"], 1):
    print(f"{i}. {q}")

print("\n=== Research Findings ===")
for f in result["research_findings"]:
    print(f"\nQ: {f['question']}")
    print(f"A: {f['answer'][:150]}...")

print("\n=== Final Report (first 500 chars) ===")
print(result["final_report"][:500])

print("\n=== Agent Trace ===")
for msg in result["messages"]:
    if hasattr(msg, "name") and msg.name:
        print(f"[{msg.name}] {msg.content[:120]}")

The trace tells the full story step by step: first the planner splits the topic, then the researcher digs up facts, then the writer shapes the draft, and the reviewer either approves or asks for another round. If a revision happened, you will spot the researcher appearing twice.

Quick mental test: What happens if you set current_step to “research” in the starting state? The router skips the planner entirely. But sub_questions is still an empty list, so the researcher loops over nothing and returns zero findings. The writer gets an empty list, sees the guard clause, and outputs “No research findings available.” The typed state catches the mistake on its own — no crash, just a clear signal that something went wrong upstream.

When Should You Use This Pattern (and When Not)?

The planner-researcher-writer pattern works well for:

Open-ended research where the topic needs to be broken into parts first.
Report building where many sources need to be woven into one document.
Due diligence tasks where being thorough matters more than being fast.

It is not the right pick for:

Simple fact questions (“What is the capital of France?”). One agent with a search tool handles that faster and cheaper.
Real-time apps where speed matters. Each agent adds a round-trip to the LLM. Four agents means at least four round-trips.
Deep domain work. The agents are only as good as what the search returns. For niche fields, you would need custom knowledge bases and domain tools.

Tip: Think about running the researcher in parallel for speed. The loop through sub-questions works but is slow. LangGraph’s `Send` API lets you fire off all sub-questions at once and collect the results when they are done. That drops your wait time from `N * search_time` down to roughly `1 * search_time`. See the LangGraph map-reduce docs for the full pattern.

What Are the Most Common Mistakes?

Mistake 1: Cramming everything into one agent’s prompt

Wrong:

python

prompt = """You are a planner, researcher, and writer.
First break down the question, then search for each part,
then write a report. Use the search tool for research."""

Why it fails: The agent cannot tell which role to play at each step. Tool use breaks down because the model sees every tool when it only needs one at a time.

Fix: Split the work across agents with tight, focused prompts. Give each one only the tools it needs.

Mistake 2: Not checking state between agents

Wrong:

python

def writer_agent(state):
    findings = state["research_findings"]  # Crashes if empty
    report = generate_report(findings)

Fix: Always check the inputs at the top of each agent:

python

def writer_agent(state):
    findings = state.get("research_findings", [])
    if not findings:
        return {"final_report": "No findings.", "current_step": "complete"}

Mistake 3: No cap on review loops

A reviewer that always finds problems creates an endless loop. API costs stack up before LangGraph’s default step limit (25) kicks in.

Fix: Track the review count in your state and stop after 2 or 3 rounds:

python

if state.get("review_count", 0) >= 2:
    return {"current_step": "complete"}

Exercise 3: Add a Fact-Checker Agent

Build a fact-checker that sits between the researcher and writer. It should run its own search to confirm each finding before passing it along.

python

type: 'exercise'
id: 'fact-checker'
title: 'Exercise 3: Build a Fact-Checker Agent'
difficulty: 'advanced'
exerciseType: 'write'
instructions: |
  Create a fact_checker_agent that:
  1. Takes each finding from research_findings
  2. Searches the web to verify the finding's answer
  3. Flags findings as verified or unverified
  4. Passes only verified findings forward
  Add a "verified_findings" field to the state.
starterCode: |
  def fact_checker_agent(state: ResearchState) -> dict:
      """Verify each finding with an independent search."""
      findings = state["research_findings"]
      verified = []

      for finding in findings:
          # TODO: Search to verify the finding
          # TODO: Ask model if evidence supports the claim
          # TODO: Append to verified if supported
          pass

      return {
          "research_findings": verified,  # Replace with only verified
          "current_step": "writing",
          "messages": [AIMessage(
              content=f"Verified {len(verified)}/{len(findings)}.",
              name="fact_checker"
          )]
      }
testCases:
  - id: 'tc1'
    input: 'print(type(fact_checker_agent({"research_findings": [], "current_step": "verify"})))'
    expectedOutput: "<class 'dict'>"
    description: 'Should return a dict'
hints:
  - 'Search with: search_tool.invoke({"query": f"verify: {finding[\"answer\"][:100]}"})'
  - 'Ask the model: "Does this evidence support the claim? Return JSON: {\"supported\": true/false}"'
solution: |
  def fact_checker_agent(state: ResearchState) -> dict:
      findings = state["research_findings"]
      verified = []
      for finding in findings:
          try:
              results = search_tool.invoke({"query": f"verify: {finding['answer'][:100]}"})
              if results:
                  check_prompt = f"Does this evidence support the claim?\nClaim: {finding['answer']}\nEvidence: {results[0]['content'][:300]}\nReturn JSON: {{\"supported\": true/false}}"
                  resp = model.invoke([HumanMessage(content=check_prompt)])
                  check = json.loads(resp.content)
                  if check.get("supported", False):
                      verified.append(finding)
          except Exception:
              continue
      return {"research_findings": verified, "current_step": "writing",
              "messages": [AIMessage(content=f"Verified {len(verified)}/{len(findings)}.", name="fact_checker")]}
solutionExplanation: |
  The fact-checker searches for independent evidence of each claim,
  then asks the LLM to compare the evidence against the original finding.
  Only findings with supporting evidence pass through. Failed verifications
  are silently dropped to keep the report trustworthy.
xpReward: 20

Complete Code

Click to expand the full script (copy-paste and run)

python

# Complete code from: Build a Multi-Agent Research Assistant in LangGraph
# Requires: pip install langgraph langchain-openai langchain-core langchain-community tavily-python
# Python 3.10+

import os
import json
from typing import Annotated, TypedDict
from datetime import datetime

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
from langchain_community.tools.tavily_search import TavilySearchResults
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import create_react_agent

# --- Configuration ---
os.environ["OPENAI_API_KEY"] = "your-openai-key-here"
os.environ["TAVILY_API_KEY"] = "your-tavily-key-here"

model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
search_tool = TavilySearchResults(max_results=3)

# --- State ---
class ResearchState(TypedDict):
    messages: Annotated[list, add_messages]
    research_query: str
    sub_questions: list[str]
    research_findings: list[dict]
    final_report: str
    current_step: str
    review_count: int

# --- Agents ---
def planner_agent(state: ResearchState) -> dict:
    query = state["research_query"]
    planner_prompt = f"""Break this research question into 3-5 focused sub-questions.
Return ONLY a JSON list of strings.

Research question: {query}"""

    response = model.invoke([HumanMessage(content=planner_prompt)])
    sub_questions = json.loads(response.content)

    return {
        "sub_questions": sub_questions,
        "current_step": "research",
        "messages": [AIMessage(
            content=f"Created {len(sub_questions)} sub-questions.",
            name="planner"
        )]
    }

def researcher_agent(state: ResearchState) -> dict:
    sub_questions = state["sub_questions"]
    findings = []
    errors = []

    for question in sub_questions:
        try:
            search_results = search_tool.invoke({"query": question})
            if not search_results:
                errors.append(f"No results for: {question}")
                continue

            context = "\n".join([
                f"Source: {r['url']}\nContent: {r['content']}"
                for r in search_results
            ])

            research_prompt = f"""Answer this question based on the search results.
Return valid JSON: {{"question": "...", "answer": "...", "sources": [...]}}

Question: {question}
Search Results:
{context}"""

            response = model.invoke([HumanMessage(content=research_prompt)])
            finding = json.loads(response.content)
            findings.append(finding)

        except json.JSONDecodeError:
            errors.append(f"JSON parse error for: {question}")
        except Exception as e:
            errors.append(f"Error: {str(e)}")

    error_msg = f" Errors: {errors}" if errors else ""
    return {
        "research_findings": findings,
        "current_step": "writing",
        "messages": [AIMessage(
            content=f"Researched {len(findings)}/{len(sub_questions)}.{error_msg}",
            name="researcher"
        )]
    }

def writer_agent(state: ResearchState) -> dict:
    query = state["research_query"]
    findings = state.get("research_findings", [])

    if not findings:
        return {
            "final_report": "No research findings available.",
            "current_step": "complete",
            "messages": [AIMessage(content="No findings to write.", name="writer")]
        }

    findings_text = "\n\n".join([
        f"Q: {f['question']}\nA: {f['answer']}\nSources: {', '.join(f['sources'])}"
        for f in findings
    ])

    writer_prompt = f"""Synthesize these findings into a research report.

Original question: {query}
Findings:
{findings_text}

Structure: Executive Summary, Key Findings, Detailed Analysis, Sources.
Write clearly with short paragraphs and inline citations."""

    response = model.invoke([HumanMessage(content=writer_prompt)])

    return {
        "final_report": response.content,
        "current_step": "review",
        "messages": [AIMessage(content="Report draft completed.", name="writer")]
    }

def reviewer_agent(state: ResearchState) -> dict:
    if state.get("review_count", 0) >= 2:
        return {
            "current_step": "complete",
            "messages": [AIMessage(
                content="Max reviews reached. Approving.",
                name="reviewer"
            )]
        }

    report = state["final_report"]
    sub_questions = state["sub_questions"]

    review_prompt = f"""Review this report against the sub-questions.
Rate as APPROVED or NEEDS_REVISION.
Return JSON: {{"verdict": "APPROVED" or "NEEDS_REVISION", "feedback": "..."}}

Sub-questions: {json.dumps(sub_questions)}
Report:
{report}"""

    response = model.invoke([HumanMessage(content=review_prompt)])
    review = json.loads(response.content)

    if review["verdict"] == "APPROVED":
        return {
            "current_step": "complete",
            "messages": [AIMessage(content="Report APPROVED.", name="reviewer")]
        }
    else:
        return {
            "current_step": "research",
            "review_count": state.get("review_count", 0) + 1,
            "messages": [AIMessage(
                content=f"Revision needed: {review['feedback']}",
                name="reviewer"
            )]
        }

# --- Routing ---
def route_step(state: ResearchState) -> str:
    step = state.get("current_step", "planning")
    routes = {
        "planning": "planner",
        "research": "researcher",
        "writing": "writer",
        "review": "reviewer",
        "complete": END,
    }
    return routes.get(step, END)

# --- Graph ---
workflow = StateGraph(ResearchState)
workflow.add_node("planner", planner_agent)
workflow.add_node("researcher", researcher_agent)
workflow.add_node("writer", writer_agent)
workflow.add_node("reviewer", reviewer_agent)

workflow.add_conditional_edges(START, route_step)
workflow.add_conditional_edges("planner", route_step)
workflow.add_conditional_edges("researcher", route_step)
workflow.add_conditional_edges("writer", route_step)
workflow.add_conditional_edges("reviewer", route_step)

graph = workflow.compile()

# --- Run ---
result = graph.invoke({
    "messages": [HumanMessage(content="Start research")],
    "research_query": "What are the latest advances in protein folding prediction?",
    "sub_questions": [],
    "research_findings": [],
    "final_report": "",
    "current_step": "planning",
    "review_count": 0,
})

# --- Output ---
print("=== Sub-Questions ===")
for i, q in enumerate(result["sub_questions"], 1):
    print(f"{i}. {q}")

print("\n=== Final Report ===")
print(result["final_report"])

print("\n=== Agent Trace ===")
for msg in result["messages"]:
    if hasattr(msg, "name") and msg.name:
        print(f"[{msg.name}] {msg.content[:120]}")

print("\nScript completed successfully.")

Summary

You now have a working four-agent research system built from scratch. Let’s recap the key ideas:

Typed state fields give each agent a clear place to read and write, with no need to parse messages.
Conditional routing based on current_step means the state itself drives the flow. No rigid wiring needed.
A review loop with a cap adds quality control without the risk of endless cycles.
Per-agent error handling means one failed sub-question does not take down the whole run.

The beauty of this design is how easy it is to extend. Point the researcher at a private knowledge base and you have an internal research tool. Plug in a citation formatter and you have an academic helper. Swap the writer for a slide builder and you have a deck generator.

The foundation you built here — typed state, flexible routing, focused agents, feedback loops — is the same foundation behind any serious multi-agent system in LangGraph.

Frequently Asked Questions

Can I use a different LLM for each agent?

Yes, and it is often a smart move. Create separate ChatOpenAI objects with different models. Use gpt-4o for the planner (it reasons better) and the writer (it writes better prose), but gpt-4o-mini for the researcher since it runs many times and the cost adds up. Each agent function just points to its own model.

How do I save progress across sessions?

Use LangGraph’s built-in checkpointing. Add a MemorySaver when you compile: graph = workflow.compile(checkpointer=MemorySaver()). Then pass a thread_id when you invoke. The full state — past findings and all — sticks around between runs. This lets you build on old research without starting over.

How do I run sub-questions at the same time?

LangGraph’s Send API is the answer. Instead of a for-loop in the researcher, you create a map node that sends each sub-question to its own researcher copy. LangGraph runs them all at once and gathers the results. Total wait time drops from N * search_time to about 1 * search_time.

python

from langgraph.types import Send

def dispatch_research(state):
    return [Send("research_one", {"question": q}) for q in state["sub_questions"]]

Do I have to use the Tavily API?

No. Tavily is handy because it gives you clean text, but you can plug in any search tool. DuckDuckGo (via langchain-community), SerpAPI, or even a RAG pipeline over your own docs would all work. Just swap out the search_tool object.

What about rate limits?

Add a short pause between API calls in the researcher loop: time.sleep(1). For production, use a retry library like tenacity with backoff. Tavily’s free tier gives you 1,000 searches a month. OpenAI rate limits depend on your plan.

References

LangGraph documentation — Multi-agent architectures. Link
LangGraph documentation — StateGraph and conditional edges. Link
LangChain documentation — Tavily Search integration. Link
LangGraph tutorials — Hierarchical Agent Teams. Link
LangGraph tutorials — Multi-agent collaboration. Link
OpenAI documentation — Chat Completions API. Link
Tavily documentation — Search API reference. Link
LangChain blog — Multi-agent workflows in LangGraph. Link
LangGraph documentation — Map-reduce and parallel execution. Link

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

LangGraph Multi-Agent Research Assistant Project

Why Is This a Multi-Agent Problem?

Prerequisites

How Do You Set Up the Project?

How Do You Design the Shared State?

How Does the Planner Agent Work?

How Does the Researcher Agent Work?

Exercise 1: Add a Relevance Filter to the Researcher

How Does the Writer Agent Work?

How Does the Quality Reviewer Work?

Exercise 2: Build a Scoring Reviewer

How Do You Wire the Full Graph?

How Do You Run the Full Pipeline?

When Should You Use This Pattern (and When Not)?

What Are the Most Common Mistakes?

Mistake 1: Cramming everything into one agent’s prompt

Mistake 2: Not checking state between agents

Mistake 3: No cap on review loops

Exercise 3: Add a Fact-Checker Agent

Complete Code

Summary

Frequently Asked Questions

Can I use a different LLM for each agent?

How do I save progress across sessions?

How do I run sub-questions at the same time?

Do I have to use the Tavily API?

What about rate limits?

References

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Why Is This a Multi-Agent Problem?

Prerequisites

How Do You Set Up the Project?

How Do You Design the Shared State?

How Does the Planner Agent Work?

How Does the Researcher Agent Work?

Exercise 1: Add a Relevance Filter to the Researcher

How Does the Writer Agent Work?

How Does the Quality Reviewer Work?

Exercise 2: Build a Scoring Reviewer

How Do You Wire the Full Graph?

How Do You Run the Full Pipeline?

When Should You Use This Pattern (and When Not)?

What Are the Most Common Mistakes?

Mistake 1: Cramming everything into one agent’s prompt

Mistake 2: Not checking state between agents

Mistake 3: No cap on review loops

Exercise 3: Add a Fact-Checker Agent

Complete Code

Summary

Frequently Asked Questions

Can I use a different LLM for each agent?

How do I save progress across sessions?

How do I run sub-questions at the same time?

Do I have to use the Tavily API?

What about rate limits?

References

Related Articles

LLM Temperature, Top-P, and Top-K Explained — With Python Simulations

Build a Python AI Chatbot with Memory Using LangChain

OpenAI API Python Tutorial — Chat Completions, Streaming & Error Handling

Python.SQL. NumPy. All free.

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python.
SQL. NumPy.
All free.