2026-06-18 (Thu) · Thematic brief — blog.r-lopes.com
The Core Claim #
Vibe coding — prompting an agent and shipping output you may not read — carries a measurable defect tax that prototypes can absorb but mission-critical systems cannot: roughly 45% of AI-generated code contains security flaws Source 42, and the practice "boosts velocity but removes critical checks producing insecure code at scale" Source 16. The failure is structural, not stylistic — it strips out the SDLC stages (spec, test, review, audit trail) that exist precisely so correctness and accountability survive contact with production Source 26. Serious engineering does not abandon AI; it re-inserts those gates as spec-driven development, adversarial verification, and named human accountability Source 18Source 42.
Evidence #
1. AI-generated code fails security at a measured, repeatable rate. This is the hardest number in the corpus: the defect classes are not random but concentrated in security and logic — XSS at 2.74× and logic errors at 1.75× human baselines — exactly the classes that matter for auth, payments, and untrusted input.
"Approximately 45% of AI-generated code contains security flaws." —
[Source 42]
2. The mechanism is removal of checks, not generation of bad code. Vibe coding skips tests, reviews, CI, and documentation; independent telemetry confirms the downstream effect — Veracode attributes a rise to "more generative AI coding," with 11.3% of vulnerabilities ranked severe versus 8.3% the prior year Source 22.
"Vibe coding boosts velocity but removes critical checks producing insecure code at scale." —
[Source 16]
3. The risk compounds because of three intrinsic agent properties. Speed outpaces review, non-determinism defeats reproduction, and cost pressure encourages cutting verification — a combination that turns a fast prototype loop into an unauditable liability when pointed at production.
"Willison warns of three properties that make AI agents dangerous: speed (they work faster than you can review), non-determinism (same input, different outputs), and cost (encouraging corner-cutting on verification)." —
[Source 18]
4. Auditability collapses when the deciding logic is transient. Mission-critical systems must answer "why did it do that" after the fact; a prompt-and-ship mutation leaves no reproducible provenance of the decision that caused it.
"If a transaction mutates several objects in a database, it is difficult to tell after the fact what that transaction means." —
[Source 19]
5. Accountability cannot be delegated to the model. Even a perfectly passing build needs a human who owns it; this is why regulated and safety-critical roles persist regardless of automation quality Source 159.
"A computer can never be held accountable." —
[Source 42]
6. The failure mode is already in the wild. A vibe-coded ransomware strain (Sakari) generated an RSA key pair then discarded the private key, making its own encryption irreversible Source 108; a widely deployed framework shipped auth-bypass middleware that an attacker could skip with a guessable header Source 74.
"the company behind xjs has been Vibe coding its security logic recently because an attacker can just say no thank you to any off checks and use your app without pain" —
[Source 74]
7. The replacement is spec-driven development plus proof-of-work verification. Instead of guessing implementation from a prompt, you contract behavior and constraints up front, then require evidence the code ran correctly — benchmarks like HumanEval/MBPP check functional pass/fail but not security or quality Source 60.
"if you haven't seen the code do the right thing yourself, it doesn't work" —
[Source 47]
How It Works #
flowchart TD
A[Intent] --> B{Stakes?}
B -->|Throwaway / self-only| C[Vibe code: prompt, accept all, ship]
B -->|Auth / payments / secrets| D[Write spec + constraints]
D --> E[Agent generates code]
E --> F[Human review + test + threat model]
F --> G{Proof it works?}
G -->|No| D
G -->|Yes| H[Merge with audit trail + named owner]
The diagram reads as a single gate: the only legitimate input to the prompt-and-ship path is work where the sole person harmed by a bug is you Source 41; everything touching auth, payments, or secrets routes through spec → review → proof → accountable merge, with each stage restoring a check vibe coding removed.
What This Means in Practice #
On a high-traffic e-commerce stack, treat AI as a high-speed intern and adopt a PR contract — intent, proof-it-works, and an explicit risk tier flagging which parts were AI-generated Source 42. Move security-critical checks out of single points of failure: do auth/authorization directly in the page or route handler rather than trusting middleware as the only gate, the exact pattern that produced the framework auth-bypass class Source 74Source 170. Keep vibe coding for what it is genuinely good at — scaffolding a CLI, prototyping a UI, or proving out an LCP/INP optimization before committing — but re-implement the shipped version with React 19 useTransition
, server components, and Next.js streaming under spec and review rather than pasting the prototype into prod Source 7Source 30. Gate dependencies (slop-squatting and dependency-confusion prevalence rises with AI suggestions Source 57) with CI scanning, and require a human threat-model pass before merge for any path handling untrusted input Source 42. The throughput win is real, but it shifts the bottleneck from typing to verification — budget for that, don't pretend it vanished Source 60.
Counter-Evidence / Limits #
The corpus is not unanimous, and the disagreement is the signal. Multiple practitioners insist vibe coding is excellent — for the right scope: prototypes, throwaways, and "where the only person who gets hurt if it has bugs is you" Source 41Source 7. Several argue the security gap is temporary, that newer models will bake in secure-by-default output as customers demand it Source 35, and the "vibe then verify" framing treats fast generation and independent verification as complementary rather than opposed [Source 70]. The honest limit: the dividing line between "safe to vibe" and "must engineer" is genuinely fuzzy and moves as models improve Source 30 — so the rule cannot be "never vibe code," it must be "know which mode you're in and gate by stakes." Where the sources converge completely is the production bar: once your bug can harm someone else, prompt-and-ship is a regression Source 41Source 42Source 47.
Try This Week #
Map. Spend 30 minutes mapping every code path in one service that touches auth, payments, or secrets. For each, mark (a) whether the security check lives in a single point of failure like middleware, and (b) whether any of it was AI-generated without a human threat-model pass. The output is a one-page risk inventory — the precondition for moving those checks into the route and assigning each an accountable owner Source 42Source 74Source 170.
Sources #
Beyond Vibe Coding with Addy OsmaniMeta E7: The Roadmap For Every Level Engineer In 2026Beyond Vibe Coding with Addy OsmaniThe AI vulnerability apocalypse, a new strain of Petya and dumb cybersecurity rulesHow to write a good spec for AI agents- Engineering Docs Is your robot vacuum safe? Here's why it matters.Spec-Driven Development: AI Assisted Coding ExplainedManus, vibe coding, scaling laws and Perplexity's AI phoneThe AI vulnerability apocalypse, a new strain of Petya and dumb cybersecurity rulesHighlights from my conversation about agentic engineering on Lenny's PodcastAI writes code faster. Your job is still to prove it works.AI writes code faster. Your job is still to prove it works.Code security for software engineersAI Dev 25 x NYC | Manish Kapur: Assessing the Quality of AI Generated CodeNext.js rocked by critical 9.1 level exploit...What cybersecurity pros need to know about OpenClaw and MoltbookThe Career Bet Every Engineer Must MakeThis Is Bad!