102. Multi-Agent Systems: When One Agent Is Not Enough

A developer demonstrated how multi-agent systems can overcome the limitations of single AI agents by using specialized agents that work together on complex tasks. The approach uses patterns like orchestrator-worker, sequential pipeline, and critic-review to coordinate multiple AI agents with different roles and system prompts. The implementation includes a Python framework with base agent classes and communication protocols that enable agents to delegate work, check each other's output, and collaborate toward shared goals.

One agent is powerful but limited. Ask it to research a topic, write an article, review that article, check the code examples, and format everything for publishing. It has to do everything sequentially. When it makes a mistake in step 2, it might not catch it until step 7. It has one perspective. One "voice." One set of strengths and weaknesses. Now imagine three specialized agents working on the same task. A research agent that searches exhaustively and compiles sources. A writing agent that takes those sources and drafts the article with a clear structure. A review agent that reads the draft critically and flags errors, gaps, and unsupported claims. Each one knows its job deeply. They check each other's work. They have different system prompts that give them different strengths. This is how complex knowledge work actually gets done. Not one person doing everything. A team of specialists coordinated toward a shared goal. Multi-agent systems bring this pattern to AI. python import os import json import time from typing import List, Dict, Callable, Optional, Any from dataclasses import dataclass, field from enum import Enum import anthropic client = anthropic.Anthropic api key=os.environ.get "ANTHROPIC API KEY" class Pattern Enum : ORCHESTRATOR WORKER = "orchestrator worker" SEQUENTIAL PIPELINE = "sequential pipeline" PARALLEL EXECUTION = "parallel execution" DEBATE = "debate" CRITIC REVIEW = "critic review" print "Multi-Agent Patterns:" print patterns = { "Orchestrator-Worker": { "description": "One LLM breaks down tasks, delegates to specialized workers, aggregates results", "best for": "Complex tasks that can be decomposed into subtasks", "example": "Research assistant: orchestrator delegates to researcher, writer, editor" }, "Sequential Pipeline": { "description": "Output of one agent becomes input to the next in a fixed chain", "best for": "Multi-stage transformation: draft → edit → format → publish", "example": "Content pipeline: researcher → writer → fact-checker → publisher" }, "Parallel Execution": { "description": "Multiple agents work simultaneously on independent subtasks", "best for": "Tasks with independent components that can run concurrently", "example": "Market research: agent A covers Asia, agent B covers Europe simultaneously" }, "Debate/Adversarial": { "description": "Two agents argue opposing positions, a judge evaluates and decides", "best for": "Decision-making, fact-checking, reducing overconfidence", "example": "Agent A argues for approach X, Agent B argues against, judge decides" }, "Critic-Review": { "description": "Creator agent produces output, critic agent evaluates and gives feedback", "best for": "Quality assurance, catching blind spots, improving output quality", "example": "Writer produces article, critic identifies weaknesses, writer revises" }, } for name, info in patterns.items : print f" {name}:" print f" {info 'description' }" print f" Best for: {info 'best for' }" print f" Example: {info 'example' }" print @dataclass class AgentMessage: from agent: str to agent: str content: str message type: str = "task" metadata: Dict = field default factory=dict class BaseAgent: """Foundation agent that all specialized agents inherit from.""" def init self, name: str, role: str, system prompt: str, model: str = "claude-3-5-haiku-20241022", tools: List Dict = None : self.name = name self.role = role self.system prompt = system prompt self.model = model self.tools = tools or self.history:List AgentMessage = def think self, message: str, context: List Dict = None, max tokens: int = 1000 - str: messages = list context or messages.append {"role": "user", "content": message} kwargs = { "model": self.model, "max tokens": max tokens, "system": self.system prompt, "messages": messages, } if self.tools: kwargs "tools" = self.tools response = client.messages.create kwargs if response.stop reason == "tool use": return self. handle tool use response, messages, max tokens return response.content 0 .text if response.content else "" def handle tool use self, response, messages, max tokens : messages.append {"role": "assistant", "content": response.content} tool results = for block in response.content: if block.type == "tool use": result = self. execute tool block.name, block.input tool results.append { "type": "tool result", "tool use id": block.id, "content": json.dumps result } messages.append {"role": "user", "content": tool results} final = client.messages.create model=self.model, max tokens=max tokens, system=self.system prompt, messages=messages, tools=self.tools return final.content 0 .text if final.content else "" def execute tool self, tool name: str, tool input: Dict - Any: return {"error": f"Tool {tool name} not implemented in {self.name}"} def repr self : return f"Agent {self.name}, role={self.role} " print "BaseAgent class built." class OrchestratorAgent BaseAgent : """Breaks down complex goals and delegates to specialized workers.""" def init self, workers: List BaseAgent : super . init name = "Orchestrator", role = "coordinator", system prompt = f"""You are an orchestrator that delegates tasks to specialized agents. Available workers: {self. format workers workers } To delegate a task, respond with JSON: {{ "delegations": {{"agent": "agent name", "task": "specific task description", "priority": 1}}, ... , "execution": "sequential" or "parallel" }} After receiving worker results, synthesize them into a final coherent answer.""" self.workers = {w.name: w for w in workers} def format workers self, workers : return "\n".join f"- {w.name} {w.role} : handles {w.role} tasks" for w in workers def run self, goal: str, verbose: bool = True - str: if verbose: print f"\n{'=' 60}" print f"Orchestrator Goal: {goal}" print f"{'=' 60}" plan prompt = f"""Goal: {goal} Create a delegation plan. Which agents should handle which parts? Respond with the JSON delegation format.""" plan json = self.think plan prompt try: plan = json.loads plan json except json.JSONDecodeError: import re match = re.search r'\{. \}', plan json, re.DOTALL if match: plan = json.loads match.group else: plan = {"delegations": {"agent": list self.workers.keys 0 , "task": goal, "priority": 1} , "execution": "sequential"} if verbose: print f"\nPlan: {plan.get 'execution', 'sequential' } execution" for d in plan.get "delegations", : print f" → {d 'agent' }: {d 'task' :60 }" worker results = {} for delegation in plan.get "delegations", : agent name = delegation "agent" task = delegation "task" if agent name in self.workers: if verbose: print f"\n {agent name} working on: {task :50 }..." result = self.workers agent name .think task worker results agent name = result if verbose: print f" {agent name} done: {result :100 }..." synthesis prompt = f"""Original goal: {goal} Worker results: {json.dumps worker results, indent=2 } Synthesize these results into a single, coherent, well-structured answer.""" final answer = self.think synthesis prompt return final answer research agent = BaseAgent name = "Researcher", role = "research", system prompt = """You are a research specialist. Your job is to find and synthesize information. Always cite sources, be thorough, and organize findings clearly. Present information as bullet points with key facts highlighted.""" writer agent = BaseAgent name = "Writer", role = "writing", system prompt = """You are a technical writer. Your job is to turn research into clear, engaging prose. Write in an accessible but precise style. Structure content with clear headings and logical flow. Target audience: developers and data scientists.""" critic agent = BaseAgent name = "Critic", role = "review", system prompt = """You are a critical reviewer. Your job is to find flaws and gaps. Be constructive but rigorous. Identify: - Factual errors or unsupported claims - Missing important information - Unclear or confusing passages - Structural improvements needed Score quality 1-10 and explain your rating.""" orchestrator = OrchestratorAgent workers = research agent, writer agent, critic agent print "\nOrchestrator-Worker system ready." print f"Workers: {list orchestrator.workers.keys }" result = orchestrator.run "Explain the key differences between BERT and GPT, including their architectures, " "training objectives, and best use cases.", verbose=True print f"\nFinal Answer:\n{result :500 }..." class Pipeline: """Agents run in sequence, output flows to next agent as input.""" def init self, agents: List BaseAgent , verbose: bool = True : self.agents = agents self.verbose = verbose self.outputs = {} def run self, initial input: str - str: current = initial input for i, agent in enumerate self.agents : if self.verbose: print f"\n Stage {i+1}/{len self.agents } {agent.name}" print f" Input: {current :80 }..." prompt = f"Previous stage output:\n{current}\n\nYour task: {agent.role}" if i 0 else current current = agent.think prompt self.outputs agent.name = current if self.verbose: print f" Output: {current :80 }..." return current draft agent = BaseAgent name = "Drafter", role = "Write a first draft. Do not worry about perfection, focus on getting ideas down.", system prompt = "You are a first-draft writer. Write quickly and completely. Cover all the key points." editor agent = BaseAgent name = "Editor", role = "Edit the draft for clarity, concision, and flow. Fix any awkward sentences.", system prompt = "You are a skilled editor. Improve clarity and remove redundancy while preserving meaning." formatter agent = BaseAgent name = "Formatter", role = "Format the edited content with proper markdown, headers, and structure.", system prompt = "You are a content formatter. Add appropriate markdown formatting, headers, and bullet points." pipeline = Pipeline agents = draft agent, editor agent, formatter agent , verbose = True print "\nSequential Pipeline: Draft → Edit → Format" final = pipeline.run "Write a brief explanation of how neural networks learn through backpropagation." print f"\nFinal formatted output:\n{final :400 }..." python import concurrent.futures import threading class ParallelAgentRunner: """Run multiple agents simultaneously on independent subtasks.""" def init self, agents and tasks: List tuple , max workers: int = 4, verbose: bool = True : self.agents and tasks = agents and tasks self.max workers = max workers self.verbose = verbose self. lock = threading.Lock def run self - Dict str, str : results = {} start = time.time def run agent agent task pair : agent, task = agent task pair if self.verbose: with self. lock: print f" → {agent.name} started: {task :50 }..." result = agent.think task if self.verbose: with self. lock: print f" ✓ {agent.name} done {time.time -start:.1f}s " return agent.name, result with concurrent.futures.ThreadPoolExecutor max workers=self.max workers as executor: futures = {executor.submit run agent, pair : pair for pair in self.agents and tasks} for future in concurrent.futures.as completed futures : name, result = future.result results name = result elapsed = time.time - start if self.verbose: print f"\nAll agents completed in {elapsed:.1f}s total" return results asia agent = BaseAgent "Asia Researcher", "researcher", "You research the Asian tech market. Focus on China, Japan, South Korea, India." europe agent = BaseAgent "Europe Researcher", "researcher", "You research the European tech market. Focus on UK, Germany, France, Nordics." us agent = BaseAgent "US Researcher", "researcher", "You research the US tech market. Focus on Silicon Valley, NYC, emerging hubs." topic = "the adoption and trends in AI/ML technology in 2024" parallel runner = ParallelAgentRunner agents and tasks = asia agent, f"Research {topic} in Asia" , europe agent, f"Research {topic} in Europe" , us agent, f"Research {topic} in the United States" , , verbose = True print "\nParallel Execution: 3 regional researchers running simultaneously" parallel results = parallel runner.run synthesizer = BaseAgent name = "Synthesizer", role = "synthesis", system prompt = "You synthesize multiple research reports into one coherent global overview." global report = synthesizer.think f"Synthesize these regional research reports into a global overview:\n\n" + "\n\n".join f"=== {name} ===\n{result}" for name, result in parallel results.items print f"\nGlobal synthesis:\n{global report :400 }..." class DebateSystem: """Two agents argue opposing sides, a judge evaluates.""" def init self, model: str = "claude-3-5-haiku-20241022" : self.proposer = BaseAgent name = "Proposer", role = "advocate", system prompt = """You are an advocate for the proposition. Make the strongest possible case FOR the position you are assigned. Use evidence, logic, and compelling arguments. Be persuasive.""", model=model self.opponent = BaseAgent name = "Opponent", role = "critic", system prompt = """You are a critic of the proposition. Make the strongest possible case AGAINST the position presented. Find flaws, gaps, counterexamples, and alternative views. Be rigorous.""", model=model self.judge = BaseAgent name = "Judge", role = "arbitrator", system prompt = """You are an impartial judge evaluating a debate. Assess both sides fairly. Identify the strongest arguments from each side. Make a reasoned final verdict with clear justification. Format: FOR arguments AGAINST arguments Verdict Reasoning """, model=model def debate self, proposition: str, rounds: int = 2, verbose: bool = True - Dict: if verbose: print f"\nDebate: '{proposition}'" print "=" 60 context p = context o = for round num in range 1, rounds + 1 : if verbose: print f"\n--- Round {round num} ---" prop arg = self.proposer.think f"Round {round num}: Argue FOR: '{proposition}'", context=context p context p.append {"role": "assistant", "content": prop arg} if verbose: print f"FOR: {prop arg :150 }..." opp arg = self.opponent.think f"Round {round num}: Counter this argument against '{proposition}':\n{prop arg}", context=context o context o.append {"role": "assistant", "content": opp arg} if verbose: print f"AGAINST: {opp arg :150 }..." context p.append {"role": "user", "content": f"Opponent says: {opp arg}"} context o.append {"role": "user", "content": f"Proposer says: {prop arg}"} all args = "\n\n".join f"FOR:\n{context p i 'content' }" for i in range 0, len context p , 2 + f"AGAINST:\n{context o i 'content' }" for i in range 0, len context o , 2 verdict = self.judge.think f"Proposition: '{proposition}'\n\nDebate arguments:\n{all args}\n\nDeliver your verdict." if verbose: print f"\nJudge's Verdict:\n{verdict :300 }..." return { "proposition": proposition, "for arguments": context p i "content" for i in range 0, len context p , 2 , "against arguments": context o i "content" for i in range 0, len context o , 2 , "verdict": verdict } debate = DebateSystem result = debate.debate proposition = "Large Language Models will replace most software engineering jobs within 10 years", rounds = 1, verbose = True class CriticReviewLoop: """Creator produces, critic evaluates, loop until quality threshold met.""" def init self, creator: BaseAgent, critic: BaseAgent, max iterations: int = 3, quality threshold: float = 8.0 : self.creator = creator self.critic = critic self.max iterations = max iterations self.quality threshold = quality threshold def run self, task: str, verbose: bool = True - Dict: history = feedback = "" for iteration in range 1, self.max iterations + 1 : if verbose: print f"\n--- Iteration {iteration} ---" creation prompt = f"{task}\n\nFeedback from previous attempt:\n{feedback}\nImprove accordingly." if feedback else task content = self.creator.think creation prompt history.append {"iteration": iteration, "content": content} if verbose: print f" {self.creator.name} : {content :120 }..." critique = self.critic.think f"Evaluate this content score 1-10 and feedback :\n\n{content}" if verbose: print f" {self.critic.name} : {critique :120 }..." import re score match = re.search r'\b 0-9 |10 \b', critique score = float score match.group if score match else 7.0 if score = self.quality threshold: if verbose: print f"\n✓ Quality threshold reached score={score} " break feedback = critique return { "final content": content, "iterations": iteration, "history": history } code writer = BaseAgent name="CodeWriter", role="code creator", system prompt="You write clean, well-documented Python code. Include docstrings and type hints." code reviewer = BaseAgent name="CodeReviewer", role="code critic", system prompt="""You review Python code rigorously. Check for: - Correctness and edge cases - Code clarity and documentation - PEP 8 compliance - Error handling Score 1-10 and give specific actionable feedback.""" review loop = CriticReviewLoop creator = code writer, critic = code reviewer, max iterations = 3, quality threshold = 8.0 print "\nCritic-Review Loop: write and improve code iteratively" result = review loop.run "Write a Python function that finds the longest palindrome substring in a string." print f"\nFinal code after {result 'iterations' } iteration s :" print result "final content" :400 print "\nWhen to Use Multi-Agent Systems:" print use cases = { "Use multi-agent when": "Tasks naturally decompose into specialized subtasks", "Quality requires multiple independent perspectives", "Parallel execution would save significant time", "Different parts of the task need different 'personalities' or constraints", "One agent's output quality is not good enough and critique helps", "Tasks exceed a single context window", , "Stick with single agent when": "Task is straightforward and fits one context window", "Coordination overhead would outweigh the benefits", "You need predictable, debuggable behavior", "Latency is critical multi-agent adds round trips ", "Budget is tight each agent call costs tokens ", "You are still prototyping complexity kills iteration speed ", , } for category, points in use cases.items : print f" {category}:" for point in points: print f" {'✓' if 'Use' in category else '✗'} {point}" print print "Essential Multi-Agent Reference Links:" print refs = { "Papers": "Society of Mind Minsky, 1986 ", "en.wikipedia.org/wiki/Society of Mind" , "LLM-based Multi-Agent Survey", "arxiv.org/abs/2402.01680" , "AutoGen: Multi-agent conversations", "arxiv.org/abs/2308.08155" , "MetaGPT: Meta programming agents", "arxiv.org/abs/2308.00352" , "ChatDev: Software development agents", "arxiv.org/abs/2307.07924" , , "Frameworks": "AutoGen Microsoft ", "github.com/microsoft/autogen" , "CrewAI", "crewai.com" , "LangGraph stateful graphs ", "langchain-ai.github.io/langgraph" , "Semantic Kernel Microsoft ", "learn.microsoft.com/semantic-kernel" , "Agency Swarm", "github.com/VRSEN/agency-swarm" , "Camel-AI", "github.com/camel-ai/camel" , , "Tutorials": "Anthropic multi-agent cookbook", "github.com/anthropics/anthropic-cookbook/tree/main/patterns/agents" , "DeepLearning.AI Multi-agent course", "learn.deeplearning.ai/multi-ai-agent-systems" , "LangGraph multi-agent tutorial", "langchain-ai.github.io/langgraph/tutorials" , "AutoGen docs and examples", "microsoft.github.io/autogen" , , "Blog Posts": "Lilian Weng: LLM Powered Autonomous Agents", "lilianweng.github.io/posts/2023-06-23-agent" , "Andrej Karpathy: Software 2.0", "karpathy.medium.com/software-2-0-a64152b37c35" , "Anthropic: Building effective agents", "anthropic.com/research/building-effective-agents" , , } for category, links in refs.items : print f" {category}:" for name, url in links: print f" • {name:<48} {url}" print Create multi agent practice.py . Part 1: implement the orchestrator-worker pattern from scratch. Create three specialized agents: a researcher mock web search , a summarizer, and a formatter. Give the orchestrator a goal like "Research and summarize the key concepts of reinforcement learning." Verify it delegates appropriately. Part 2: build a sequential pipeline with four stages. Stage 1: brainstorm 10 ideas for a blog post on a technical topic. Stage 2: select the best three and outline each. Stage 3: write one paragraph for each. Stage 4: format into a complete post with headings. Part 3: implement the critic-review loop. Write a code generation task sort algorithm, data structure, utility function . Run 3 iterations of write-critique-improve. Does the code quality measurably improve across iterations? Part 4: debate two real technical positions. Example: "Python is better than JavaScript for backend development." Run two rounds. Print both sides' arguments and the judge's verdict. Does the debate surface arguments you had not considered? Agents need memory to be truly useful across sessions. The next post covers agent memory systems: how to store past actions, how to recall relevant past experience, and how to build agents that improve over time rather than starting fresh every conversation.