Plan 3×, Build Once: How Three Models Plan What One Model Ships A developer is using three AI models to plan a single software feature, then having a fourth model implement it. The process involves Opus 4.7 and GPT-5.5 independently designing a plan, Gemini 3.1 Pro auditing both and selecting the best combination, and GPT-5.5 coding the final product. The approach has shipped three one-shot builds so far, costing $2–4 in planning and saving $20–60 in implementation per project. Plan 3×, Build Once: How Three Models Plan What One Model Ships I asked Opus 4.7 and GPT-5.5 to independently plan the same new feature. Then I handed both plans to Gemini 3.1 Pro and asked it to pick the winning combination. Here is the sentence that earned the work: Opus designed a consumer SaaS application beautiful, real-time, client-side, targeting writers . GPT-5.5 designed a B2B API marketing tool pragmatic, server-tied, conversion-focused, targeting engineers/QA . Because your goal is to sell API keys for VeracityAPI, GPT-5.5 has the winning product strategy, but Opus has the winning technical architecture. Each model produced a half-right plan. Single-model planning would have shipped one of them. Gemini's audit produced a synthesis neither model would have written alone. For every meaningfully complex build, I plan in three models. Opus 4.7 plans it. GPT-5.5 plans it. Gemini 3.1 Pro audits both and picks the Goldilocks. Then GPT-5.5 one-shots the implementation against the reconciled plan. - The 3-model loop has shipped three one-shot builds so far: the Veracity Chrome extension, the Palmaura backend, and now the VeracityAPI Text Linter. - The receipts: Opus over-indexes on architecture for the wrong audience. GPT-5.5 over-indexes on conversion with the wrong technical primitives. Gemini reliably picks the half each model got right. - Three planning passes cost ~$2–4 in API. The implementation pass they save costs ~$20–60. The math is straightforward once you've run it. This isn't a science. It's three builds. But the pattern is sharp enough that I now refuse to single-model-plan anything bigger than a weekend project. The loop For every meaningfully complex build, I run this: Opus 4.7 plans it. Fresh context. The plan tends to optimize for architectural elegance and developer experience. GPT-5.5 plans it. Fresh context. The plan tends to optimize for product strategy and conversion. Gemini 3.1 Pro audits both. Picks the winning combination. Flags what each model missed. Patches the synthesis. Then GPT-5.5 one-shots the implementation against the Gemini-reconciled plan, stepping through the milestones with subagent-driven development. The first time I tried this I was skeptical. Three planning passes felt like overhead. By the third build I stopped feeling silly. The Goldilocks plan is the artifact I needed all along. The receipt: VeracityAPI Text Linter I'm shipping a new feature on VeracityAPI https://veracityapi.com — an AI text linter at /tools/style-editor . The decision tree was loaded: UX : Hemingway-style editor? Static form? Agent skill suite published to GitHub? Deployment : Cloudflare Worker only? Open-source repo? Both? Architecture : contenteditable ? Textarea overlay? No editor at all — just JSON? Positioning : Writers? Developers? Enterprise trust teams? Each model anchored on its own priors. Here is what they each produced. What Opus 4.7 proposed Opus produced two framings I gave it two passes with different lead-in prompts : Plan A — open-source agent skill suite. Publish to GitHub as veracityapi/ai-text-risk-skills . Five skills: ensemble detector, author-style-compare, provenance-probe, source-claim-audit, benchmark harness. Heavy emphasis on multi-signal risk reports, "evidence not accusations." Local model execution. BYOK for search providers. Plan B — Hemingway-style editor. Live editor at /tools/style-editor . contenteditable + overlay