*Part 5 of the LangGraph Mental Model series.For other parts of the series : *
What this article assumes:You understand the seven-module structure, can write a single-agent graph with tools and memory, and know how to and resume execution withinterrupt(). Everything here builds on that foundation. We go from "why would I even use multiple agents?" all the way to a full multi-agent research assistant.
Imagine youβve built the single-agent assistant from Part 1. It works. Then your users start asking for more: βCan it also check emails, research topics, write reports, and manage tasks , all in one conversation?β So you add tools. More tools. More instructions in the system prompt. Suddenly your agent has 15 tools and a 2,000-word system prompt, and its performance quietly gets worse. The LLM gets confused about which tool to use when. It sometimes uses the email tool for research tasks and the research tool for email tasks.
This is the cognitive overload problem, and itβs the primary reason multi-agent systems exist. The solution is the same one software engineers have used for decades: split the responsibility.
Multi-agent systems in LangGraph solve three specific problems:
Cognitive overload β one LLM performing too many unrelated tasks at once. Split into specialized agents, each excellent at one thing.
Sequential bottlenecks β tasks that could run at the same time but are forced to run one after another. Parallel agents fix latency.
Complexity management β a 40-node graph is impossible to reason about. Breaking it into smaller subgraphs makes each piece understandable and testable independently.
Everything in this article addresses one of these three problems.
This is the simplest multi-agent pattern and the right starting point. One supervisor agent receives the userβs request and decides which specialist agent should handle it. The specialist does the work and returns results to the supervisor, which then decides what to do next.
Think of a law firm: a senior partner (supervisor) talks to the client, understands the need, and assigns work to junior associates (specialists) one for contracts, one for litigation, one for compliance. The junior associate does the deep work. The senior partner reviews and responds to the client.
User Input β[Supervisor Node] β decides who handles what β β[Specialist A] [Specialist B] β do the actual work β β[Supervisor Node] β reviews results, responds or routes again βResponse
The key design decision here is next_agent β a field the supervisor writes to, which the router reads.
Each specialist has its own, focused set of tools β no overlap.
The key structure here is the return-to-supervisor edge. After every specialist completes its work, it reports back to the supervisor. The supervisor then evaluates the full conversation and decides: assign to another specialist, loop the same specialist, or finish. This is the orchestration pattern at its most fundamental.
The supervisor pattern is sequential β one specialist works at a time. But many real tasks have independent sub-tasks that can run simultaneously. Parallelization fixes the latency problem.
Think of cooking a meal: you donβt finish the salad, then start the pasta, then start the sauce. You prep the salad, put the pasta water on, and start the sauce β all at the same time. The meal finishes faster because independent tasks run in parallel.
In LangGraph, parallelism is created by a single node having multiple outgoing edges to different nodes. LangGraph detects this βfan-outβ pattern and runs all the destination nodes concurrently in what it calls a superstep. The results are collected before any subsequent node runs (βfan-inβ).
When multiple nodes run in the same superstep and try to write to the same state field, LangGraph needs to know how to combine those writes. Without a reducer, it throws an error. With operator.add, it safely concatenates the results.
The fan-out is just two edges from the same source (START). LangGraph automatically detects this and runs both destinations concurrently. The fan-in happens naturally β synthesize can only run once both of its incoming edges are satisfied (i.e., both parallel nodes have completed).
Parallelization with flat nodes works well for simple cases. But when each parallel βbranchβ is itself complex β with multiple steps, its own tools, its own logic β you need subgraphs. A subgraph is a fully compiled StateGraph that runs as a single node inside a parent graph.
Think of departments in a company. The CEO (parent graph) delegates work to the Engineering department and the Marketing department. Each department has its own internal processes, meetings, and workflows. The CEO doesnβt care about the internals β they just hand off work and receive deliverables.
Subgraphs communicate with their parent graph through shared state keys. Any key that appears in both the parentβs state and the subgraphβs state is automatically passed in (when the subgraph starts) and passed back out (when the subgraph finishes).
Parent State: {topic, cleaned_data, report_A, report_B, processed_ids} βSubgraph A State: {cleaned_data, internal_step_1, report_A, processed_ids} β reads 'cleaned_data' from parent β writes 'report_A' and 'processed_ids' back to parentSubgraph B State: {cleaned_data, internal_step_2, report_B, processed_ids} β reads 'cleaned_data' from parent β writes 'report_B' and 'processed_ids' back to parent
Keys that are only in the subgraphβs state (like internal_step_1) are private β the parent never sees them.
The output schema (FailureAnalysisOutput, PerformanceAnalysisOutput) is the piece most tutorials skip β and then they wonder why they get state key conflicts. The output schema acts as a filter, controlling exactly which fields the subgraph exposes to the parent. Any field not in the output schema is treated as private internal state.
Parallelization and subgraphs cover the case where you know at design time how many parallel branches youβll have. But what if you donβt? What if the user asks to research 5 topics today and 20 topics tomorrow? You canβt hard-code 20 parallel branches.
This is the map-reduce pattern, powered by LangGraphβs Send API.
Think of a book publisher assigning chapters: one editor is given the manuscript. They split it into chapters and assign each chapter to a different copy editor simultaneously. However many chapters there are β five, twenty, two β thatβs how many copy editors get hired. When theyβre all done, the results are collected and assembled into the final book. The number of parallel workers is determined at runtime, not at design time.
Send(node_name, state) β from langgraph.constants. Instead of routing to a fixed next node, Send creates a new instance of a node with a specific state payload. Return a list of Send objects from a routing function, and LangGraph launches all of them in parallel, each with its own independent state.
operator.add on the collecting field β the "reduce" step. As each parallel worker finishes and returns its result, operator.add accumulates them into a growing list in the parent state.
Worker state β each Send can include a custom state dict that doesn't have to match the parent graph's state. The worker node uses its own small local state β just the data it needs for its specific piece of work.
The continue_to_jokes function is where the magic happens. Notice that it returns a list of Send objects, not a string. When LangGraph sees a list of Send objects from a routing function, it launches all of them in parallel immediately. If state["subjects"] has 3 items, 3 parallel generate_joke nodes launch. If it has 20, 20 launch. The graph scales dynamically to whatever runtime produces.
Now we combine everything: supervisor orchestration, HITL approval (Part 3), parallel subgraphs, and the Send API β into one complete production-grade system. This is the LangGraph Academy's capstone project, annotated and explained.
The system: A user provides a research topic. The system generates a team of AI analyst personas (with human approval to refine them), then runs each analyst as a parallel interview sub-agent. Each analyst interviews an AI βexpertβ using web search, producing a report section. Finally, a writer node compiles all sections into a final polished report.
User provides topic β[create_analysts] β LLM generates N analyst personas β[human_feedback] β INTERRUPT: human approves or sends feedback β β[should_continue] β if feedback, loop back to create_analysts β[initiate_research] β Send API: spawns N parallel interview subgraphs βββ (N parallel workers running simultaneously)[interview_subgraph Γ N] β each analyst interviews an "expert" via web search β[write_report] β LLM compiles all sections into a final report
python
from pydantic import BaseModel, Field# ββ Analyst persona - a Pydantic model for structured LLM output ββclass Analyst(BaseModel): name: str = Field(description="The analyst's full name") role: str = Field(description="Their professional role (e.g. 'Financial Analyst')") focus: str = Field(description="Their specific analytical focus area") @property def persona(self) -> str: return f"Name: {self.name}\nRole: {self.role}\nFocus: {self.focus}"class Perspectives(BaseModel): analysts: list[Analyst]# ββ Outer graph state βββββββββββββββββββββββββββββββββββββββββclass ResearchGraphState(TypedDict): topic: str max_analysts: int human_analyst_feedback: str # human writes here during HITL analysts: list[Analyst] sections: Annotated[list, operator.add] # accumulated from all parallel interviews final_report: str# ββ Interview subgraph state ββββββββββββββββββββββββββββββββββclass InterviewState(MessagesState): # inherits: messages: Annotated[list[BaseMessage], add_messages] max_num_turns: int context: Annotated[list, operator.add] # search results accumulate here analyst: Analyst interview: str # formatted transcript sections: list # completed section (sent back to outer graph)
Each analyst runs as an independent subgraph. The subgraph is a mini ReAct agent that interviews an βexpertβ (another LLM call that plays the expert role) and searches the web.
from langchain_core.tools import toolfrom langgraph.prebuilt import ToolNode@tooldef web_search_tool(query: str) -> str: """Search the web for information on a topic.""" return f"[Search results for '{query}']" # Replace with real search APIinterview_tools = [web_search_tool]interview_llm = llm.bind_tools(interview_tools)def generate_question(state: InterviewState) -> dict: """The analyst node: generates the next interview question.""" analyst = state["analyst"] messages = [ SystemMessage(content=( f"You are {analyst.name}, a {analyst.role} focused on {analyst.focus}. " f"You are interviewing an expert about the research topic. " f"Ask insightful questions that align with your analytical focus. " f"When you have enough information, say 'Thank you, that is all I needed.'" )) ] + state["messages"] response = interview_llm.invoke(messages) return {"messages": [response]}def generate_answer(state: InterviewState) -> dict: """The expert node: answers the analyst's question, using web search.""" messages = [ SystemMessage(content=( "You are an expert being interviewed. Answer thoroughly and factually. " "Use the web search tool when you need current data." )) ] + state["messages"] response = interview_llm.invoke(messages) return {"messages": [response]}def save_interview(state: InterviewState) -> dict: """Formats the full Q&A exchange into a clean transcript string.""" transcript = [] for msg in state["messages"]: if hasattr(msg, "content"): role = "Analyst" if type(msg).__name__ == "HumanMessage" else "Expert" transcript.append(f"{role}: {msg.content}") return {"interview": "\n\n".join(transcript)}def write_section(state: InterviewState) -> dict: """The final node: uses the interview transcript to write a report section.""" analyst = state["analyst"] prompt = [ SystemMessage(content=( f"Based on the following interview conducted by {analyst.name} ({analyst.role}), " f"write a structured report section focused on {analyst.focus}. " f"Be concise, factual, and cite specific points from the interview." )), HumanMessage(content=state["interview"]) ] response = llm.invoke(prompt) # sections is the key that Send passes back to the outer graph return {"sections": [response.content]}def route_interview(state: InterviewState) -> Literal["generate_answer", "save_interview"]: """Continue interviewing OR wrap up when analyst is satisfied or max turns hit.""" last_message = state["messages"][-1] if ( "thank you, that is all" in last_message.content.lower() or len(state["messages"]) >= state.get("max_num_turns", 6) * 2 ): return "save_interview" # Check if analyst made a tool call (web search) if hasattr(last_message, "tool_calls") and last_message.tool_calls: return "generate_answer" return "generate_answer"# ββ INTERVIEW SUBGRAPH ASSEMBLY βββββββββββββββββββββββββββββββinterview_builder = StateGraph(InterviewState)interview_tool_node = ToolNode(interview_tools)interview_builder.add_node("generate_question", generate_question)interview_builder.add_node("generate_answer", generate_answer)interview_builder.add_node("interview_tools", interview_tool_node)interview_builder.add_node("save_interview", save_interview)interview_builder.add_node("write_section", write_section)interview_builder.add_edge(START, "generate_question")interview_builder.add_conditional_edges( "generate_question", route_interview, {"generate_answer": "generate_answer", "save_interview": "save_interview"})interview_builder.add_conditional_edges( "generate_answer", lambda s: "interview_tools" if (hasattr(s["messages"][-1], "tool_calls") and s["messages"][-1].tool_calls) else "generate_question", {"interview_tools": "interview_tools", "generate_question": "generate_question"})interview_builder.add_edge("interview_tools", "generate_answer")interview_builder.add_edge("save_interview", "write_section")interview_builder.add_edge("write_section", END)interview_graph = interview_builder.compile()
if __name__ == "__main__": config = {"configurable": {"thread_id": "research-001"}} # Step 1: Generate analysts result = research_graph.invoke( { "topic": "The impact of AI on the future of software engineering", "max_analysts": 3, "human_analyst_feedback": "", "analysts": [], "sections": [], "final_report": "" }, config=config ) # Step 2: HITL β graph d at human_feedback node snapshot = research_graph.get_state(config) analysts = snapshot.values["analysts"] print("Generated analysts:") for a in analysts: print(f" - {a.name} ({a.role}): {a.focus}") feedback = input("\nProvide feedback to refine analysts (or press Enter to approve): ") if feedback.strip(): # Human has feedback β inject it and let supervisor regenerate research_graph.update_state( config, {"human_analyst_feedback": feedback}, as_node="human_feedback" ) else: # Approved β clear feedback and proceed to interviews research_graph.update_state( config, {"human_analyst_feedback": None}, as_node="human_feedback" ) # Step 3: Resume β N parallel interviews launch via Send, then report is compiled print("\nRunning parallel interviews...") final_result = research_graph.invoke(None, config=config) print("\n" + "="*60) print("FINAL RESEARCH REPORT") print("="*60) print(final_result["final_report"])
Use this to decide which pattern fits your situation:
Single agent getting confused between tasks β Supervisor Pattern. Split work into specialized sub-agents. The supervisor routes; specialists execute.
Tasks that are independent and slow β Parallelization (Fan-out / Fan-in). Run them simultaneously. Add operator.add reducers for shared fields.
Parallel branches that are themselves complex β Subgraphs. Compile each complex branch as its own graph and embed it as a node. Use output schemas to control what returns to the parent.
Number of parallel workers determined at runtime β Send API (Map-Reduce). Generate a list of Send objects from a router. Each Send spawns an independent worker with its own state payload.
A real production system β All of the above, combined. The research assistant at Level 5 uses every pattern together because a real task requires all of them.
This extends the keyword cards from Parts 1β3.
Multi-Agent Structure Keywords supervisor_node β the orchestrating node. Reads state, writes next_agent, never does specialist work itself. specialist_node β does one focused task with its own tools and system prompt. Always routes back to supervisor when done. next_agent: str β the standard state field for supervisor routing. The supervisor writes a name; the router reads it.
Parallelization Keywords Fan-out β multiple add_edge calls from the same source node. LangGraph detects this and runs targets concurrently. Fan-in β multiple add_edge calls pointing to the same destination. Destination waits for all sources to complete. Annotated[list, operator.add] β mandatory on any state field that multiple parallel nodes write to. Without this, parallel writes crash. operator.add β the most common reducer for parallel patterns. Concatenates lists from concurrent nodes.
Subgraph Keywords StateGraph(InternalState, output=OutputSchema) β the two-argument form of StateGraph. The second argument filters which keys are returned to the parent. Output schema β a TypedDict with only the keys the subgraph should expose to the parent. Keys absent from this schema are private to the subgraph. Overlapping keys β the communication channel between parent and subgraph. Any key in both state schemas is automatically shared. subgraph.compile() β seals the subgraph into a callable. After this, pass it to parent.add_node("name", compiled_subgraph). xray=1 β argument to graph.get_graph(xray=1).draw_mermaid() β visualizes internal subgraph structure in the parent graph diagram.
Map-Reduce / Send Keywords Send(node_name, state_dict) β from langgraph.constants. Represents a single parallel worker invocation. Does not need to match parent graph state. [Send("node", {...}), Send("node", {...}), ...] β returning a list of Send objects from a routing function launches all of them in parallel. ["node_name"] β the third argument to add_conditional_edges when using Send. A list (not a dict) of valid destination node names, for graph validation only. with_structured_output(PydanticModel) β chains after an LLM to force structured JSON output that validates against a Pydantic schema. The standard pattern for supervisor decisions and analyst generation.
The jump from a single agent to a multi-agent system isnβt really a jump at all β itβs the same seven modules, applied multiple times and wired together. Every βagentβ in a multi-agent system is just a graph, or a node, or a subgraph. The primitives donβt change. The skill is knowing how to compose them.
The progression in this article followed a deliberate staircase: one supervisor, then parallel flat nodes, then complex parallel subgraphs, then dynamic parallelism with Send, then the full combination. Each step added exactly one new concept. If any step felt comfortable, that's the design working β each level builds cleanly on the one before.
The research assistant at Level 5 is genuinely close to what youβd find in a production multi-agent codebase. Human approval loops, parallel specialist agents, structured LLM output, dynamic worker spawning, and a final synthesis step. You now have a mental model and a working template for all of it.
With all four parts complete, you have the full production scaffold: canonical structure (Part 0) + memory management (Part 1) + human-in-the-loop safety (Part 2) + multi-agent orchestration (Part 3). These four articles together cover the architecture behind the vast majority of real-world LangGraph applications.
For other parts of the series : Part 0 , Part 1 , Part 2 , Part 3 , Part 4 .
LangGraph Multi-Agent Systems: From One Brain to Many was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.