Developers won’t work without AI anymore. The research says it might be making them worse.

Developers are refusing to work without AI tools, even for research studies, but evidence from Amazon, Uber, and independent researchers shows that AI-assisted coding may actually slow developers down and increase long-term maintenance costs. Amazon shut down an internal token-tracking leaderboard after employees gamed the system with excessive AI use, while Uber burned through its entire 2026 AI budget in four months without measurable productivity gains. The findings challenge the widespread industry assumption that more AI token consumption translates to higher output, with data showing AI-generated code introduces 1.7 times more problems than human-written code.

TL;DR Devs refuse to code without AI, but research shows it may slow them down. Amazon killed its token leaderboard. Uber blew its AI budget in four months. METR couldn't repeat its productivity study because developers refused to participate without AI. Amazon and Uber discovered that more tokens don't mean more output. Devs refuse to code without AI, but research shows it may slow them down. Amazon killed its token leaderboard. Uber blew its AI budget in four months. In February 2026, AI research lab METR tried to repeat a groundbreaking study measuring how much time developers take to complete tasks with and without AI. It could not. Developers refused to participate https://techcrunch.com/2026/05/29/coders-are-refusing-to-work-without-ai-and-that-could-come-back-to-bite-them/ because they would not work without AI, even for a limited number of tasks in a research setting. The original 2025 study had produced a surprising result. Developers reported that AI made them more productive. The data showed the opposite: AI actually slowed them down because they spent extra time finding and fixing errors, steering the AI, and waiting for it to complete tasks. Unable to replicate the experiment, METR published a survey in May instead. Developers self-reported that AI made them twice as valuable to their organisations. Recent evidence from multiple sources suggests that perception is wrong. Amazon shut down an internal token-tracking leaderboard called Kirorank this week, the Financial Times reported. Employees were gaming it by using AI agents excessively and running up costs. The leaderboard proved that AI use does not automatically translate to increased productivity. Uber blew through its entire 2026 AI budget within the first four months of the year, The Information reported. COO Andrew Macdonald said on a podcast that the spending had not led to a measurable increase in projects or productivity. Two of the world’s most technically sophisticated companies spent heavily on AI coding tools and could not demonstrate a return. The term for this pattern is “ tokenmaxxing “: using token consumption as a proxy for productivity. It has been the corporate trend of 2026. It may already be over. The Amazon and Uber examples show that measuring AI adoption by volume of use, rather than quality of output, produces the wrong incentives. Salesforce projects $300 million in Anthropic token spending this year. https://thenextweb.com/news/salesforce-benioff-300-million-anthropic-tokens-slack-coding CEO Marc Benioff called for an “intermediary layer” that could route tokens intelligently between frontier and cheaper models. The call for a routing layer is an implicit admission that not every token produces value, and that spending needs to be matched to task complexity. The code quality problem is the deeper issue. Programmer and author James Shore argued in a viral blog post that faster code generation without reduced maintenance costs is a trap. “You write code twice as quick now? Better hope you’ve halved your maintenance costs,” he wrote. “ Otherwise, you’re screwed. You’re trading a temporary speed boost for permanent indenture. ” The data supports the warning. Entelligence AI, a reliability engineering startup, claims that companies spend 44% of their tokens on bug fixes that their AI generated. CodeRabbit, a code-reviewing tool, analysed open-source pull requests and found that AI produced 1.7 times more problems than human code. Both companies sell AI code review tools, which makes the statistics self-serving but not necessarily wrong. Independent researchers at Singapore Management University published a report in April reaching the same conclusion. “ AI-generated code can introduce long-term maintenance costs into real software projects, ” they wrote. The code ships faster. The bugs arrive later. The maintenance debt compounds. The question engineering leaders are avoiding https://thenextweb.com/news/the-question-ai-providers-hope-vps-of-engineering-never-ask is whether the productivity gains from AI coding tools are real or perceived. If developers refuse to work without AI but the AI is generating more bugs than it prevents, the net effect could be negative. The dependency has outpaced the evidence. Cognition founder Scott Wu, maker of AI coding agent Devin, admits the tool’s skill level sits between a junior and mid-level programmer depending on the task. It is not a hand-off-and-forget solution. The SMU researchers recommend treating AI output the way you would treat code from a junior developer: review everything, maintain strong QA systems, and keep humans responsible for architecture and security design. The job market reflects the contradiction. https://thenextweb.com/news/new-ai-jobs-evangelist-philosopher-vibecoder-fde Companies are hiring “ vibe coders ” and forward deployed engineers at unprecedented rates while simultaneously discovering that the tools those roles depend on may not produce the quality gains their hiring assumes. The AI coding market is growing faster than the evidence that it works. Developers will not go back to coding without AI. That ship has sailed. The question is whether the industry will build the quality assurance infrastructure, the routing layers, and the review processes needed to ensure that faster code production does not become faster technical debt production. Right now, the answer is no. Developers love the tools. The tools may not love them back. Get the most important tech news in your inbox each week.