{"slug": "harness-engineering", "title": "Harness Engineering", "summary": "Blake Aber of Predicate Ventures argues that production AI success depends 90% on the 'harness'—observability, evals, rollback, and silent-failure detection—and only 10% on the model itself. He warns that enterprise AI pilots often fail at month nine because the harness was never built, even though the model still works. The key to scaling is investing in infrastructure that makes models observable, auditable, and recoverable in production.", "body_md": "*Why production AI is 90% harness and 10% model. And what most pilots miss.*\n\n**Blake Aber** · Predicate Ventures · 2026\n\nThe gap between an AI demo that lands in a steering-committee deck and an AI system that stays in production isn't the model. It's the harness: the standing infrastructure around the model that makes it observable, auditable, and recoverable when things go sideways. Roughly 90% of what determines whether a pilot ships is harness work. About 10% is model capability. Most enterprise AI programs invert this ratio in budgeting, staffing, and attention.\n\nThis is why pilots that pass validation quietly die in production at month nine. The model still works at month nine. The harness was never built.\n\n**Observability.** Not a dashboard. The standing infrastructure that runs the deployed model against named scenarios continuously, tracks output drift over time, and flags when the input distribution has shifted enough that the validation set no longer covers production reality. Observability is what turns *\"the model passed at week zero\"* into *\"the model is still passing.\"* Without it, the model is a snapshot, not a system.\n\n**Evals.** Not the test set used at training time. The eval harness that runs in production, against real workloads, on a cadence that catches regressions before they accumulate. Evals must be cheap enough to run continuously, structured enough to detect specific failure modes, and owned by someone. Not \"the platform team.\" A specific named person whose job it is to look at the dashboard. Without ownership, evals decay.\n\n**Rollback.** Not the model-version manifest. The operational mechanism that takes a deployed model out of production within a defined window (minutes, not weeks) when something is going wrong. Rollback includes the human authority (who can pull the model), the technical mechanism (how the model is pulled), the fallback path (what happens to in-flight requests), and the post-rollback audit (what's reconstructed afterward). Most programs treat rollback as something they'll figure out if needed. The programs that ship treat it as the first thing they design.\n\n**Silent-failure detection.** The hardest of the four. Models fail in two ways: catastrophically (everyone notices) and silently (the model produces plausible-looking output that's wrong). Silent failure is strictly worse than a crash. Users learn to trust the model on the 90% it gets right and never notice the 10% it gets wrong until the audit. Silent-failure detection requires instrumenting the *boundary* between the model and the downstream consumer: what the model said, what the user did with it, whether the action's outcome matched the model's confidence. Without this instrumentation, the model can be wrong for months before anyone notices.\n\nMost enterprise AI initiatives quietly die at month nine. Not at validation. Validation passed at week zero. Month nine is when:\n\nThe pilot didn't fail because the model got worse. It failed because the harness around the model was never built, and the model's environment changed.\n\nThis is the failure mode that doesn't show up in vendor demos because vendor demos run against the static dataset the model was tuned on. It doesn't show up in pilot reports because pilots end before month nine. It shows up in the steering-committee meeting at month ten where someone asks why the metrics are flat and discovers the model has been silently wrong for a quarter.\n\nThe programs that ship at scale make four decisions before the model is selected:\n\nThese four decisions aren't engineering nice-to-haves. They're the difference between a pilot that ships and a pilot that quietly dies at month nine.\n\nTwo pressures are converging. **First**, model capability is plateauing across vendors. The differentiator at production scale shifts from *which model you select* to *which harness you've built*. Once you accept that, the next question is how the harness scales across many workflows, which is where [spec-driven orchestration](https://www.predicate.ventures/writing/specs-not-prompts) replaces bespoke prompt scaffolding. The orgs with harness discipline can deploy any model safely. The orgs without it can't deploy the best model safely. **Second**, regulatory expectations are tightening. Auditors are asking design-input questions, not review-end questions. Programs without explicit harness artifacts are going to discover this expensively.\n\nThe harness engineering gap is the structural advantage in production AI for the next 24 months. For PE firms, this is the case for [proving AI internally before pushing it to the portfolio](https://www.predicate.ventures/writing/internal-first): the firms investing in real observability, evals, rollback, and silent-failure detection now will be running at a different operational tempo by 2027. The orgs treating those primitives as post-deployment retrofits will be running reactive rebuilds.", "url": "https://wpnews.pro/news/harness-engineering", "canonical_source": "https://dev.to/blake_aber_f8c344d227aa82/harness-engineering-1nmg", "published_at": "2026-06-29 15:34:19+00:00", "updated_at": "2026-06-29 15:49:29.771455+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-infrastructure", "ai-safety", "mlops", "ai-products"], "entities": ["Blake Aber", "Predicate Ventures"], "alternates": {"html": "https://wpnews.pro/news/harness-engineering", "markdown": "https://wpnews.pro/news/harness-engineering.md", "text": "https://wpnews.pro/news/harness-engineering.txt", "jsonld": "https://wpnews.pro/news/harness-engineering.jsonld"}}