Fallacies of GenAI Development #8: More AI Agents Means More Productivity

A developer's series on generative AI fallacies concludes that deploying multiple AI agents on a single codebase does not linearly scale productivity, because the agents form a distributed system that introduces coordination failures. Without shared protocols and specifications, adding more agents leads to integration conflicts, contradictory patterns, and architectural drift rather than increased throughput. The post argues that the assumption "more agents equals more productivity" is false, as each agent's local decisions create invisible inconsistencies that compound over time.

This is the eighth and final post in a series on the false assumptions teams make when building with generative AI. The series began with the observation that the trough of disillusionment for AI-assisted development has arrived — not because AI is useless, but because eight false assumptions made the trough inevitable. This post covers the last assumption and closes the series. "If one AI agent gives us a 10x boost, ten agents will give us 100x." The arithmetic feels irresistible. One agent generates code for the backend. Another generates the frontend. A third writes tests. A fourth handles database migrations. A fifth generates documentation. Each agent works in parallel. No meetings, waiting or coordination overhead. Pure throughput. Leadership sees the potential: a five-person team with fifty agents has the output of a fifty-person team at the cost of a five-person team plus API credits. The scaling is linear. The economics are transformational. And the early results confirm it. Each agent, working on its own, produces impressive output. The backend agent generates Go code. The frontend agent generates React components. The test agent generates test suites. Each agent, in isolation, looks like a 10x developer. You've seen this problem before. It has a name. It's called distributed systems. A distributed system is a collection of independent actors that must coordinate to produce a coherent result. Each actor makes decisions locally. The system's correctness depends on those local decisions being compatible globally. When they aren't, you get inconsistency, conflicts, data corruption, and cascading failures. AI agents working on the same codebase are a distributed system. Each agent makes decisions — variable names, error handling strategies, retry policies, data formats, abstraction levels, dependency choices. Each decision is made locally, in the context of one prompt, one file, one task. No agent sees the full picture. No agent coordinates with the other agents. Each agent's decisions are invisible to the others. Distributed systems engineers spent 40 years learning that you can't scale a distributed system by adding more nodes. You can only scale it by adding protocols — consensus mechanisms, ordering guarantees, conflict resolution rules, interface contracts. Without protocols, more nodes means more conflicts, not more throughput. The same applies to AI agents. More agents without specifications means more invisible decisions, more inconsistency, more cognitive fragmentation — not more productivity. Month 1-2: The parallel sprint. Five agents work simultaneously on different parts of the system. Each produces well-structured code. PRs flow in from every direction. The team merges them rapidly. The system takes shape faster than anyone expected. Month 3-4: The integration cracks. The backend agent chose camelCase for JSON field names. The frontend agent expected snake case . The database migration agent used PascalCase for column names. None knew about the others' choices. Each was reasonable in isolation. The integration fails silently — data flows through but field mappings are wrong. The bug appears as "the UI shows the wrong value" and takes two days to trace to a naming mismatch across three layers. Month 5-6: The contradictory patterns. The backend agent implemented retries with exponential backoff and jitter. The API gateway agent implemented retries with a fixed 3-second delay. Both are valid retry strategies. Both were generated from different training data patterns. When a downstream service is slow, the backend retries with increasing delays while the gateway retries every 3 seconds — creating a thundering herd that overwhelms the already-slow service. The agents' patterns contradicted each other. Neither knew the other existed. Month 7-8: The architectural drift. Each agent, given different tasks over months, evolved different internal patterns. The backend agent started using result types for error handling. The frontend agent uses exceptions. The test agent mixes both depending on which prompt it received. The codebase has three error handling philosophies, each locally consistent, globally incoherent. A new developer opens the codebase and can't determine which pattern is correct because all three exist in production. Month 9-10: The edit war. Agent A refactors a shared utility function for performance. Agent B, in a separate task, refactors the same function for readability. Agent A's change merges first. Agent B's change overwrites A's optimization. Neither agent knows the other touched the file. The team pays for the token cost of both refactors and gets the result of neither. Adam Bender of Google has a name for this — in his talk Software Engineering at the Tipping Point , he calls it the agentic edit war: two agents refactoring the same file toward different goals, livelocking the system while the company pays for the tokens on both sides. Worse, without a pattern specification, the cycle repeats. Agent A sees the unoptimized code and refactors it again for performance. Agent B sees the unreadable code and refactors it again for readability. The agents consume tokens infinitely, bouncing the code back and forth between two valid-but-contradictory goals. In distributed systems, this is called livelock — the system is active but making no progress. In AI development, it's called your API bill. Month 11-12: The coordination collapse. The team needs to make an architectural change — migrate from REST to gRPC. Each agent needs to be told individually. Each interprets the migration prompt differently. The backend agent generates gRPC server code. The frontend agent keeps generating REST client calls because its prompt wasn't updated. The test agent generates tests for both protocols because it sees both in the codebase. The migration that should take a week takes two months because every agent is working against the others. The team discovers they've been managing a distributed system without the protocols that distributed systems require. Each agent is a node. Each node is making decisions. Nobody built the consensus layer. Hold up the two systems side by side: Distributed Computing 1990s : Distributed AI Development 2020s : ─────────────────────────── ────────────────────────────────── Multiple processes on multiple nodes Multiple agents on one codebase Each makes local decisions Each makes local decisions No shared state by default No shared context by default Inconsistency is the default outcome Inconsistency is the default outcome Distributed computing solved this with protocols: Protocol What it solves AI equivalent ──────────────────── ──────────────────── ────────────────────── Consensus Paxos, Raft Agreement on shared state Shared specification repo Ordering vector clocks Event sequencing Architectural priority rules Conflict resolution Concurrent modifications Specification gate on merge Interface contracts IDL Cross-service compatibility API contracts + contract tests Schema evolution Backward compatibility Migration specifications Circuit breakers Cascading failure prevention Dependency scope limits Every protocol in the left column has an AI development equivalent in the right column. The solutions exist. They're called specifications, contracts, and enforcement gates. They're the same coordination mechanisms — applied to agents instead of processes. Problem: Each agent chooses its own naming conventions. Fix: A convention specification fed to every agent as context. "JSON fields: camelCase. Database columns: snake case. Go structs: PascalCase. Environment variables: UPPER SNAKE." Four lines. Every agent reads them. Every change is checked against them by a linter. Inconsistency becomes mechanically impossible. Critical: the specification must be immutable for the duration of concurrent agent tasks. In distributed systems, "split brain" happens when nodes have different versions of the truth. If Agent A has the old naming convention and Agent B has the new one, the codebase gets both. Version the specification. Update it between task batches, not during them. Problem: Each agent implements different patterns for the same concern. Fix: One ADR per cross-cutting concern. "ADR-007: Retry strategy is exponential backoff with jitter, base 100ms, max 5 attempts, across ALL services." The specification is the protocol. The CI check is the enforcement. Any agent that generates fixed-delay retries fails the build. Problem: Two agents modify the same file with contradictory goals. Fix: CODEOWNERS + module boundaries. Each module has one owner human or agent . Cross-module changes require the interface contract to be satisfied. Agents can't modify modules outside their scope. Same principle as microservice boundaries — but for agents. Problem: Three error handling philosophies coexist in the codebase. Fix: One interface contract: "All public functions return result, error . No exceptions. No panics. No sentinel values." Enforced by the compiler Go or by a linter rule TypeScript . The contract is the specification. The tool is the enforcement. Problem: An architectural migration is interpreted differently by each agent. Fix: A migration specification: "Phase 1: Add gRPC endpoints alongside REST. Phase 2: Migrate clients to gRPC. Phase 3: Remove REST. Current phase: 1. Agents must not remove REST endpoints." Each agent reads the current phase. The specification prevents agents from jumping ahead or falling behind. Each fix is small. Each is a few lines of text. Each is fed to agents as context AND enforced mechanically by CI. The context helps agents make correct decisions. The enforcement catches them when they don't. In distributed computing, protocols enable coordination without requiring every node to understand every other node. Node A doesn't need to know Node B's implementation. It needs to know Node B's interface contract. If both nodes respect the protocol, the system is consistent — regardless of how many nodes you add. The same principle applies to AI agents. Agent A doesn't need to know Agent B's prompt. It needs to know the specification that governs the module it's working on. If all agents respect the specifications, the codebase is consistent — regardless of how many agents you add. Distributed computing: Protocol enables coordination between nodes Distributed AI dev: Specification enables coordination between agents Distributed computing: More nodes + same protocol = more throughput Distributed AI dev: More agents + same specifications = more productivity Distributed computing: More nodes + no protocol = more conflicts Distributed AI dev: More agents + no specifications = more inconsistency Scaling agents is scaling a distributed system. The mechanisms are the same. The lesson is the same. The solution is the same. Eight fallacies. One meta-pattern. Fallacy 1: Faster generation = faster engineering → The leading sub-system outran the lagging ones Fallacy 2: Looks correct = is correct → Plausibility is not correctness Fallacy 3: AI can verify AI → Correlated failure modes don't converge Fallacy 4: Drop review = remove bottleneck → Removing a gate without replacing it removes the safety net Fallacy 5: Better context = correct output → Input quality doesn't guarantee output correctness Fallacy 6: Generated code is an asset → Code is a liability; capability is the asset Fallacy 7: Specs are new work → The specifications already exist; the enforcement doesn't Fallacy 8: More agents = more productivity → More actors without protocols = more conflicts Every fallacy stems from one root assumption: generating the output is the hard part. This assumption is wrong. Understanding the output, verifying it, maintaining it, coordinating the actors that produce it, and preserving the rationale for why it's shaped this way — those are the hard parts. They always were. AI made the easy part faster. The hard parts didn't change. Peter Deutsch's Distributed Computing Fallacies worked because they named the assumptions that every developer made, discovered were wrong, and paid for in production. The network is not reliable. Latency is not zero. Bandwidth is not infinite. The Fallacies of GenAI Development work the same way. Generation is not engineering. Plausible is not correct. More agents is not more productivity. Each assumption sounds true. Each leads to failure. Each has already been resolved by a domain that learned the lesson first. The resolution is the same across every fallacy, every domain, every era: Recognize the specifications that already exist in your system — types, contracts, schemas, boundaries Fallacy 7 . Enforce them mechanically on every change, at machine speed Fallacies 1, 3, 4 . Verify the output against declared properties, not just the input quality Fallacies 2, 5 . Measure properties verified , not code generated Fallacy 6 . Use specifications as coordination protocols for agents Fallacy 8 . One architectural principle. Eight applications. The teams that adopt it first will emerge from the trough of disillusionment ahead of everyone else. The teams that don't will learn the same lessons the hard way — the same way distributed systems developers learned Deutsch's fallacies, one production incident at a time. The engineer's role has changed. Not from "writing code" to "writing prompts" — that's the Fallacy 4 trap, the prompt operator with no ownership. The real shift is from writing code to designing protocols . The specifications, contracts, boundaries, and enforcement gates that enable agents to coordinate safely. The engineer becomes the protocol designer for a distributed system of AI actors. That's not a demotion from "programmer." It's the same move the industry made when it went from writing assembly to designing systems. The abstraction level changed. The engineering judgment became more valuable, not less. The trough is real. The exit is specifications — recognized, enforced, and verified. This concludes The Fallacies of GenAI Development. The complete series: 1 Faster Generation ≠ Faster Engineering · 2 Plausible ≠ Correct · 3 AI Can't Verify AI · 4 Dropping Review ≠ Removing Bottleneck · 5 Better Context ≠ Correct Output · 6 Generated Code Is a Liability · 7 Specifications Already Exist · 8 More Agents ≠ More Productivity For cloud infrastructure specifically, the specification-first model is implemented in Stave — an open-source tool that evaluates configuration snapshots against 2,662 safety invariants with deterministic mechanical verification. Apache 2.0. For a single IAM policy file, try iam-explain — point it at your policy JSON, see what the math says. The specifications are already in your policy. The tool shows you what they mean. The Fallacies of GenAI Development were inspired by Peter Deutsch's "Fallacies of Distributed Computing" 1994 . The resolution draws from Parnas 1972 , Altshuller's TRIZ 1946 , Byron Cook's automated reasoning at AWS, and evidence from aviation, nuclear operations, financial trading, and Google's monorepo. Each fallacy was discovered independently across domains. The convergence is the evidence.