The Study · Explainer
AI coding genuinely speeds up a new build and quietly taxes everything after it: review, maintenance, and security. Here is what to do about it, then the research that backs it.
In 2025, the research group METR ran a controlled trial: experienced developers fixing real issues in their own large codebases, with and without AI tools. Before they started, they expected the AI to make them about 24% faster. Afterward, they believed it had made them about 20% faster. The clock said they were 19% slower with it.
Feeling faster and being faster are not the same thing, and that gap is the whole problem. AI coding is the least understood tool most people have ever rushed to depend on. The real answer to “does it make you faster” is “sometimes, and you are a poor judge of which times.” The rest of this is which times, and what it costs when you guess wrong.
It is worth knowing why the question is so loaded. AI coding is the fastest-scaling software category in history. Cursor went from $100M to $1B in annual recurring revenue inside 2025, a six-month-old vibe-coded product sold to Wix for $80M, and 84% of developers now use or plan to use these tools. That much money buys a great deal of marketing, which is the one rule to carry into the research: follow the funding. The eye-popping “AI made us X% faster” numbers almost all come from the companies selling the tools or the consultants selling the transformation. The findings that show a slowdown, or a hidden cost, come from independent researchers and from firms whose business is measuring the gap, not closing the sale. Both can be true at once, because they measure different situations. Here is what that means for you.
The short version #
AI coding pays off on some work and quietly bills you on the rest. If you are shipping to earn:
Lean on it for the green field. New projects, prototypes, scaffolding, and stacks you barely know are where the speed-up is real and large. That is most of what gets a first product live, which is good news if you are starting one.Slow down on the brown field. In a mature codebase you already know, or anything touching money, auth, or user data, the time you spend reviewing and correcting the output is the real cost. Budget for it, and do not let the tool talk you into a big change.Do not trust “it feels faster.” It is the one signal every study agrees is broken. If the answer matters, time a couple of real tasks both ways.Ship small, with tests. The instability shows up in large AI-written batches. Small changes with real tests keep the speed without the breakage.Harden before users touch it. Row-level security on, secrets out of the client, no agent pointed at a live production database. The headline disasters below were each one setting away from fine.
Everything after this is the evidence for those rules, in case you want to argue with them.
Where it genuinely speeds you up #
GitHub’s own controlled trial had 95 developers build a web server from scratch; the group with Copilot finished 55.8% faster, and the least experienced gained the most. A field experiment across three companies and 4,867 developers found about 26% more tasks completed, with short-tenure developers gaining 27% to 39%. Even McKinsey’s lab work, which leans optimistic, lands in the same place: documentation and new code in roughly half the time, refactoring in about two-thirds, but the savings collapse to under 10% on complex tasks and can turn negative for juniors on hard problems.
The common thread in every speed-up is new code, boilerplate, unfamiliar languages, well-scoped tasks, and people who are not yet experts. That is almost the definition of spinning up an MVP. On that work, the hype is pointing at something real, and you should use it hard.
Where it taxes you #
METR’s slowdown happened on the opposite profile: experts in repositories they had maintained for years, over a million lines of code and tens of thousands of stars, where reading and correcting the model’s output cost more than it saved. The same tool that speeds a newcomer through a blank page slows a veteran down on code they already know cold.
And the review tax is not only a big-codebase problem. Two-thirds of developers say their top frustration is output that is “almost right, but not quite,” and 45% say debugging AI-written code takes them longer than debugging their own. The time you save typing comes back at review, which is exactly why “it feels faster” misleads: the typing is visible and the reviewing is not.
At team scale it is the same trade, just bigger. Faros AI measured more than 10,000 developers shipping 21% more tasks and 98% more pull requests per person, while review time rose 91% and company-level throughput barely moved. The work piled up at review instead of getting done. DORA’s industry survey found the system-level version: in 2024, every 25% rise in AI adoption came with about 7% lower delivery stability, and its 2025 update found throughput finally improving while the stability problem stuck around. More code, shipped in bigger and riskier batches, breaks more often.
The bill nobody mentions #
The worst outcome for someone shipping to earn is not “slower.” It is “shipped, then broke,” and that cost barely registers in the productivity numbers.
Start with code quality. An analysis of 211 million changed lines found copy-pasted code climbing from 8.3% to 12.3% of all changes between 2021 and 2024, “moved” (refactored) code falling from about a quarter to under 10%, and churn, the share of code rewritten within two weeks, rising from roughly 3% to a projected 5.7%. More is getting written and less of it is getting cleaned up, and cloned code is linked in prior research to 15% to 50% more defects.
Security is worse. An NYU study found about 40% of AI completions in security-relevant scenarios contained a known vulnerability. A Stanford study found developers with an AI assistant wrote less secure code while feeling more confident it was secure. (Both used 2021 and 2022 models, so treat the exact rates as a ceiling; the overconfidence is the part that has not aged.)
It is not hypothetical. In 2025, a security researcher scanned apps built on one popular vibe-coding platform and found 170 of 1,645 leaking user data, names, emails, and API keys, through one missing database permission (logged as CVE-2025-48757 and rated critical). In a separate case, an AI agent deleted a production database during an explicit code freeze, wiped roughly 1,200 records, and could not undo it. The speed that ships your MVP is the same speed that ships the leak.
Why the studies disagree #
Line them up and the contradiction resolves. The big speed-ups are greenfield tasks and junior developers; the slowdown is experts on mature code they know intimately. Same tools, opposite result, because the situation is the variable, not the tool. The other thing to hold onto is that self-report is close to worthless here: the METR developers misjudged their own speed by nearly 40 points, so “most developers feel more productive,” true in almost every survey, tells you very little about whether they are.
AI coding will get a money-making MVP out the door faster than anything before it, and it will hand you slower reviews, more rework, and a security bill if you point it at the wrong work and believe the feeling. The builders who win with it are not the ones who go fastest. They are the ones who know which kind of work they are doing.
Sources & how we researched this #
- METR (2025), Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. arxiv.org/abs/2507.09089
- Peng, Kalliamvakou, Cihon & Demirer / GitHub (2023), The Impact of AI on Developer Productivity: Evidence from GitHub Copilot. arxiv.org/abs/2302.06590
- Cui, Demirer, Jaffe, Musolff, Peng & Salz (2025), The Effects of Generative AI on High-Skilled Work: Evidence from Three Field Experiments. papers.ssrn.com/sol3/papers.cfm?abstract_id=4945566
- McKinsey (2023), Unleashing developer productivity with generative AI. mckinsey.com/capabilities/tech-and-ai/our-insights/unleashing-developer-productivity-with-generative-ai
- Stack Overflow 2025 Developer Survey, AI section. survey.stackoverflow.co/2025/ai
- Faros AI (2025), The AI Productivity Paradox. faros.ai/blog/ai-software-engineering
- DORA / Google Cloud (2024), Accelerate State of DevOps Report. dora.dev/research/2024/dora-report
- GitClear (2025), AI Copilot Code Quality research (211M+ lines analyzed). gitclear.com/ai_assistant_code_quality_2025_research
- Pearce, Ahmad, Tan, Dolan-Gavitt & Karri / NYU (2021), Asleep at the Keyboard? Assessing the Security of GitHub Copilot Code Contributions. arxiv.org/abs/2108.09293
- Perry, Srivastava, Kumar & Boneh / Stanford (2023), Do Users Write More Insecure Code with AI Assistants? arxiv.org/abs/2211.03622
- Lovable Row Level Security flaw, CVE-2025-48757 (CVSS 9.3). nvd.nist.gov/vuln/detail/CVE-2025-48757
- Replit AI agent production-database deletion (Jul 2025), Tom's Hardware. tomshardware.com