{"slug": "multi-agent-systems-in-production-when-one-agent-isn-t-enough-and-how-we-them", "title": "Multi-Agent Systems in Production: When One Agent Isn't Enough and How We Coordinate Them", "summary": "A developer team accidentally built a multi-agent system when a single-agent monolith grew unmanageable. They now use patterns like supervisor-worker, pipeline, and event-driven coordination to manage complexity, with structured data contracts between agents to avoid token waste and quality degradation.", "body_md": "We built our first \"multi-agent system\" by accident. What started as a single agent that could research a topic, draft a report, check it against source data, and send a summary email had grown into a 2,000-token system prompt and a function list so long that the model kept forgetting tools existed. It wasn't a system — it was a monolith pretending to be intelligent.\n\nBreaking it apart into coordinated agents fixed most of the problems. It also introduced a new category of problems we hadn't thought about. Here's what we actually learned.\n\nThe temptation to add more agents is real, but the overhead isn't free. Every agent boundary you add is a place where context can get lost, latency increases, and errors compound.\n\nOne agent is the right call when:\n\nYou need multiple agents when:\n\nThe key question we ask: *Is this one job or a pipeline of jobs?* If you'd describe it to a human as \"first do X, then Y takes that and does Z\", you probably have a pipeline, not a single task.\n\nA thin orchestrator agent decides what needs doing, dispatches to specialised worker agents, and stitches the results together. The workers are narrow — they do one thing and don't need to know about the rest of the workflow.\n\nThis is our most common pattern. The supervisor's system prompt stays small because it's routing, not reasoning. The workers' prompts can be highly optimised for their specific job.\n\nEach agent's output is the next agent's input. No orchestrator — just a chain. We use this for document processing: extract → chunk → summarise → classify. Each step is independent enough that we can swap out or retrain one without touching the others.\n\nAgents subscribe to events rather than being called directly. An intake agent processes a new customer request and emits an event; a triage agent picks it up, classifies it, and emits another; a response agent drafts the reply. We use this with Celery and Redis when the steps can happen asynchronously and we don't need the full chain to complete before responding to the user.\n\nHere's a simplified version of how we implement the supervisor pattern. The orchestrator Celery task manages the workflow; individual agent tasks do the actual LLM calls.\n\n``` python\n# tasks/orchestrator.py\nfrom celery import chain, chord\nfrom .agents import extract_data_task, analyse_data_task, draft_report_task\n\n@app.task(bind=True, max_retries=3)\ndef run_report_pipeline(self, document_id: int, user_id: int):\n    \"\"\"\n    Supervisor: extract → analyse → draft, with error isolation at each step.\n    \"\"\"\n    try:\n        # Build the pipeline as a Celery chain\n        pipeline = chain(\n            extract_data_task.s(document_id),\n            analyse_data_task.s(user_id=user_id),\n            draft_report_task.s(user_id=user_id),\n        )\n        result = pipeline.apply_async()\n        return {\"pipeline_id\": result.id, \"status\": \"started\"}\n\n    except Exception as exc:\n        # Retry with exponential backoff before giving up\n        raise self.retry(exc=exc, countdown=2 ** self.request.retries)\n```\n\nEach agent task is responsible for its own LLM call and its own error handling. The orchestrator doesn't need to know what model each agent uses, or whether agent two calls a tool — it just cares about the shape of the data passing between steps.\n\nThe naïve approach is to pass the full output of each agent directly into the next. This breaks down fast: LLM outputs are verbose, and feeding 3,000 tokens of analysis into a drafter that only needs 5 key facts wastes tokens and degrades quality.\n\nWe use a structured intermediate format — a plain Python dataclass or Pydantic model — as the contract between agents. Each agent's output is validated against this schema before it's passed downstream.\n\n``` python\nfrom pydantic import BaseModel\nfrom typing import Optional\n\nclass ExtractionResult(BaseModel):\n    document_id: int\n    key_facts: list[str]          # Max 10 bullet points\n    raw_data_summary: str          # Under 500 chars\n    confidence_score: float        # 0–1\n    extraction_warnings: list[str] # Anything the agent flagged\n\nclass AnalysisResult(BaseModel):\n    document_id: int\n    findings: list[str]\n    risk_flags: list[str]\n    recommended_actions: list[str]\n    analysis_notes: Optional[str] = None\n\n# In the extraction agent task:\n@app.task\ndef extract_data_task(document_id: int) -> dict:\n    raw_output = call_llm(\n        system=\"You are a data extraction specialist...\",\n        user=get_document_text(document_id),\n        response_format=ExtractionResult,  # Structured output enforced\n    )\n    result = ExtractionResult.model_validate(raw_output)\n    return result.model_dump()  # Celery serialises as dict\n```\n\nEnforcing the schema at the boundary means your analysis agent never has to guess what the extraction agent gave it. When something breaks, the error is at the boundary where it belongs, not buried three steps later.\n\nThe hardest part of multi-agent systems is failure handling. In a monolithic agent, one failure terminates one task. In a pipeline, a failure in step two means you've wasted step one and need to decide whether to retry from the start or from step two.\n\nOur approach:\n\n`PipelineRun`\n\nmodel with status fields for each step. This lets us resume partial pipelines and gives us visibility into where things are breaking.\n\n```\n# models.py\nclass PipelineRun(models.Model):\n    document = models.ForeignKey(Document, on_delete=models.CASCADE)\n    status = models.CharField(max_length=20, default='pending')\n\n    # Checkpointed results per step\n    extraction_result = models.JSONField(null=True)\n    analysis_result = models.JSONField(null=True)\n    draft_result = models.JSONField(null=True)\n\n    # Step-level status\n    extraction_status = models.CharField(max_length=20, default='pending')\n    analysis_status = models.CharField(max_length=20, default='pending')\n    draft_status = models.CharField(max_length=20, default='pending')\n\n    error_detail = models.TextField(blank=True)\n    created_at = models.DateTimeField(auto_now_add=True)\n    updated_at = models.DateTimeField(auto_now=True)\n```\n\nThis makes debugging a failed pipeline actually feasible. You open the admin, find the `PipelineRun`\n\n, see which step failed, and read the error. Without this, you're parsing Celery logs hoping something tells you what happened.\n\nMulti-agent architectures solve real problems — context overflow, specialisation, parallelism, and failure isolation. But they introduce coordination overhead that a single agent doesn't have. You're trading simplicity for scalability and resilience.\n\nThe things this doesn't solve: it won't fix a poorly designed system prompt on an individual agent, it won't save you if your task decomposition is wrong, and it adds latency. Every agent boundary is a round-trip to an LLM.\n\nStart with one agent. Add a second when you have a clear reason — not because it sounds more impressive. The moment you're debugging why agent three hallucinated because agent two gave it a vague extraction result, you'll appreciate the value of simple.\n\nWe run multi-agent pipelines in production for document processing, automated research workflows, and customer triage. They work well, but every one of them started life as a single agent that we only split apart when we had a concrete reason.\n\n[Lycore builds production AI systems](https://www.lycore.com/ai-development-services/) for businesses — we design and implement multi-agent pipelines, RAG systems, and LLM integrations that hold up in production. [Get in touch](https://www.lycore.com/contact-us/) if you want to talk through your use case.", "url": "https://wpnews.pro/news/multi-agent-systems-in-production-when-one-agent-isn-t-enough-and-how-we-them", "canonical_source": "https://dev.to/lycore/multi-agent-systems-in-production-when-one-agent-isnt-enough-and-how-we-coordinate-them-2m20", "published_at": "2026-06-28 15:19:00+00:00", "updated_at": "2026-06-28 15:33:51.185350+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "developer-tools", "mlops"], "entities": ["Celery", "Redis", "Pydantic"], "alternates": {"html": "https://wpnews.pro/news/multi-agent-systems-in-production-when-one-agent-isn-t-enough-and-how-we-them", "markdown": "https://wpnews.pro/news/multi-agent-systems-in-production-when-one-agent-isn-t-enough-and-how-we-them.md", "text": "https://wpnews.pro/news/multi-agent-systems-in-production-when-one-agent-isn-t-enough-and-how-we-them.txt", "jsonld": "https://wpnews.pro/news/multi-agent-systems-in-production-when-one-agent-isn-t-enough-and-how-we-them.jsonld"}}