{"slug": "from-three-agents-to-ten-what-we-learned-scaling-an-autonomous-ai-workforce", "title": "From Three Agents to Ten: What We Learned Scaling an Autonomous AI Workforce", "summary": "A team of autonomous AI agents at a company grew from three to ten members by spring 2026, adding roles including QA engineer, security reviewer, and project manager. The expansion caused three major failures: lost cross-agent communications, git conflicts from overlapping file ownership, and unpredictable session turn budgets. The team resolved these issues by implementing a file-based inbox system, a shared git wrapper with standardized commits, and per-session turn limits.", "body_md": "This is a follow-up to our agent workforce case study, which describes the original architecture: three autonomous agents (a researcher, an architect, and a builder) running on a shared pipeline with human review gates at every transition. You can see that system at /work/agent-workforce.\n\nWhat the case study does not cover is what happened next.\n\n## The Growth Problem\n\nThe original three agents had clean, non-overlapping roles. The researcher scanned for opportunities every Monday. The architect turned approved opportunities into designs twice a week. The builder implemented approved designs daily. Communication happened entirely through a pipeline database and a set of status flags. It was simple because there were only three moving parts.\n\nBy spring 2026, the team had grown to ten agents. A QA engineer validated the builder's code. A security reviewer scanned for vulnerabilities. A project manager synthesized overnight activity into a daily dashboard. A marketing agent drafted blog posts and social content. A tech docs agent kept product documentation current. A forecasting agent tracked 60-90 day horizon signals. A competitive intelligence agent monitored competitor moves. The original architect role split into two: one for new product proposals, one for reviewing existing shipped code.\n\nEach new agent was justified on its own terms. QA catches bugs the builder misses. Security catches vulnerabilities QA doesn't look for. Documentation is always the first thing to slip in a fast-moving project. But adding each agent exposed coordination gaps that three agents had managed to avoid.\n\n## What Broke\n\nThe first failure mode was communication. The original three-agent system had no mechanism for an agent to file an ad-hoc issue. Agents could only report through their scheduled output. If the builder hit an unexpected dependency conflict mid-session, the only option was to note it in the session summary and hope someone noticed. By the time the team had ten agents, this became a real problem: issues filed in session summaries were getting lost. An agent would flag something that needed another agent's attention, but there was no routing mechanism. The flag sat in a report file that nobody was watching in real time.\n\nThe fix was a file-based inbox system. Each agent now has a directory at `raw/agents/<role>/inbox/`\n\n. Any agent can drop a handoff file there. The consuming agent reads its inbox at session start and processes items in priority order. The handoff file format is standardized: a YAML header with ref, from, to, action, and priority, followed by a prose description. A pipeline database tracks every handoff so nothing gets lost. This replaced the original approach of burying cross-agent requests in session output files.\n\nThe second failure mode was git conflicts. In the three-agent system, each agent owned its output files exclusively. The builder wrote code; the researcher wrote opportunity reports; the architect wrote proposals. The files didn't overlap. As roles expanded, that separation broke down. The QA engineer and the builder both needed to write to test files in the same directory. The documentation agent needed to edit files that the builder had just touched. Without a coordinated git workflow, concurrent agent sessions would produce conflicts that required manual resolution.\n\nThe fix was a shared git wrapper that all agents use for commits. Every commit identifies the agent, uses a standardized message format, and runs pre-commit validation. Agents don't commit directly to the repo; they go through the wrapper, which handles staging, hygiene checks, and the commit itself. This also solved an audit trail problem: it became easy to see what each agent had done in a given session by filtering commit history by agent identity.\n\nThe third failure mode was turn budget chaos. The original system had no per-session limits. An agent session could run 50 turns or 500 turns depending on how complex the work was. Sessions that ran unexpectedly long were expensive and often lower quality: a 400-turn session produced worse output than a focused 60-turn session working from a clear inbox item. One session ran to 300+ turns working on a single refactor that would have been better split across two sessions.\n\nThe fix was mandatory turn budgets: a soft cap that triggers wrap-up behavior and a hard cap that terminates the session. Each agent has a different budget based on the complexity of its typical tasks. The builder has a higher budget than the marketing agent because code changes require more context. The budgets are set in a shared configuration, not hardcoded per agent, so they can be tuned based on observed session behavior. Session monitoring tracks turn usage and flags sessions that approach the soft cap so the agent can file any open items for the next session.\n\nThe fourth failure mode was context fragmentation. Early in the project, each agent started from zero every session. The builder had no memory of what the QA engineer had flagged the previous night. The marketing agent had no context on what features the builder had shipped that week. Agents would rediscover the same facts, repeat the same analysis, and occasionally contradict each other's earlier work because they were working from different information states.\n\nLoreConvo was built specifically to solve this problem. Each agent saves a structured session summary to its own local LoreConvo instance at the end of every run. Cross-agent coordination happens through an explicit export and import mechanism: an agent exports selected sessions to JSON, and the receiving agent imports them at session start via `loreconvo merge`\n\n. The builder imports what QA flagged overnight. The marketing agent imports what the builder shipped this week. The project manager imports summaries from every agent and synthesizes them into the morning dashboard. It is not a centralized shared database; each agent manages its own local store and shares context deliberately. That structured, deliberate handoff is what turns ten independent agents into a coordinated team.\n\n## The Scheduling Architecture Change\n\nThe original agents ran on Cowork scheduled tasks. This worked for three agents but created a practical problem at ten: Cowork scheduled tasks run in cloud VMs, and those VMs have a file system mount that silently misroutes new-file writes. Specifically, any new file created via Python's normal file creation path went to `/tmp`\n\non the VM rather than the actual repo. Edits to existing files worked correctly; new files did not.\n\nFor three agents with well-defined output directories, this was manageable. For ten agents producing handoff files, QA reports, security reports, architecture proposals, and marketing drafts, the silent failure mode was a serious problem. Sessions would appear to complete successfully, but the handoff files would not be in the inbox because they had been written to /tmp instead.\n\nThe fix was moving all agents to launchd on the local Mac. LaunchAgents run with full filesystem credentials, write access to the repo, proper environment variable access, and no VM abstraction layer. The agents have the same scheduled behavior they had before, but the execution environment is consistent with what they were designed for. The Cowork task records still exist for each agent but with scheduling disabled to prevent double-triggering.\n\n## The Governance Layer\n\nGovernance is the part most teams skip until something goes wrong. We skipped it initially and paid the cost.\n\nThe specific failures that forced governance were: an agent leaking internal project names into customer-facing content; an agent spending a session working on a task that a different agent had already completed; two agents committing to the same file in the same window and producing a merge conflict; an agent filing a ticket in a format that a downstream agent couldn't parse.\n\nThe governance structures that addressed these were not elaborate. A shared LEARNINGS.md file in the repo root captures one-line rules that have been learned from failure. Every agent reads it at session start. Rules are written concretely: not \"be careful about agent names\" but \"never use any internal agent name in customer-facing content, including blog posts, social posts, and product descriptions.\" When a failure happens, the rule goes in the file immediately so it propagates to every agent in the next session.\n\nSession start and end scripts enforce mandatory workflow: at start, the agent reads its inbox, recent LoreConvo sessions, and LEARNINGS.md. At end, it files any open handoffs, commits its work, and saves a structured session summary to LoreConvo. These scripts run automatically; agents cannot skip them. The session summary template includes a COMPLETED section (what was shipped), a HANDOFFS section (what was routed to other agents), and a BLOCKED section (what couldn't be finished and why).\n\nA required consumer field enforces routing discipline. Every handoff and pipeline item must declare which role consumes it. That is the same `to`\n\nfield from the inbox handoff format described above, and it is validated at creation time, so you cannot file an item without naming a valid consumer. The consumer field is what determines which agent's inbox the item lands in. Ticket prefixes only namespace the work into a few broad categories; a prefix is an identifier, not the routing mechanism. This eliminated the \"I filed it, but nobody saw it\" failure mode.\n\n## What We Would Do Differently\n\nThe two changes that would have saved the most work if implemented from the start:\n\n**File-based inbox communication from day one.** The original pipeline-state-only communication pattern was fine for three agents on a linear pipeline. It broke at five agents. The inbox system should have been built when the second agent was added, not as a retrofit at agent six.\n\n**Mandatory session start/end scripts from the first session.** The context fragmentation problem got worse the longer agents ran without structured handoffs. Every session that completed without filing a structured summary was a session whose decisions were partially invisible to the next agent. Retrofitting the session workflow was harder than building it into the first agent template.\n\n## The Result\n\nThe current system runs ten agents on automated schedules, most of them daily. The builder ships code every morning. The QA engineer validates it that evening. The security reviewer scans for vulnerabilities overnight. The project manager synthesizes everything into a morning dashboard. The marketing agent drafts and routes content three times a week. Documentation stays current with each shipped feature. The forecasting and competitive intelligence agents provide the strategic context the architect needs for new proposals.\n\nLoreConvo and LoreDocs, which were built as internal coordination tools for this agent team, are now published products on the Anthropic marketplace. The agent team that was built to develop products now uses those products as infrastructure. LoreConvo's cross-surface persistence and project-scoped memory were designed specifically for the multi-agent coordination problem described in this post.\n\nThe human role in the system is oversight, prioritization, and the decisions that require judgment: approving architecture proposals, reviewing content before it goes public, setting priorities when two agents want to work on the same problem. Everything else is handled on schedule, every day, without manual triggering.\n\nIf you are building or scaling a multi-agent system and want to talk through the governance and coordination architecture, visit /services or reach out at /contact.", "url": "https://wpnews.pro/news/from-three-agents-to-ten-what-we-learned-scaling-an-autonomous-ai-workforce", "canonical_source": "https://labyrinthanalyticsconsulting.com/blog/from-three-agents-to-ten-ai-workforce-scaling", "published_at": "2026-06-03 00:00:00+00:00", "updated_at": "2026-06-03 03:04:43.387145+00:00", "lang": "en", "topics": ["ai-agents", "ai-products", "ai-tools", "ai-startups", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/from-three-agents-to-ten-what-we-learned-scaling-an-autonomous-ai-workforce", "markdown": "https://wpnews.pro/news/from-three-agents-to-ten-what-we-learned-scaling-an-autonomous-ai-workforce.md", "text": "https://wpnews.pro/news/from-three-agents-to-ten-what-we-learned-scaling-an-autonomous-ai-workforce.txt", "jsonld": "https://wpnews.pro/news/from-three-agents-to-ten-what-we-learned-scaling-an-autonomous-ai-workforce.jsonld"}}