{"slug": "we-taught-ai-to-write-code-but-we-forgot-to-teach-it-to-think", "title": "We Taught AI to Write Code But We Forgot to Teach It to Think.", "summary": "Nearly half of all commercial code is now AI-generated, but code churn has doubled and engineers increasingly cannot explain how their own systems work. The crisis stems from AI coding assistants producing clean, correct-looking code that passes review yet remains incomprehensible to human developers, creating a hidden debt where the fast part of development has accelerated while debugging and understanding code has become measurably harder. Teams report feeling 20% faster with AI tools, yet end-to-end measurements show they are roughly 19% slower due to the increased time required to verify and trace AI-generated code.", "body_md": "# We Taught AI to Write Code But We Forgot to Teach It to Think.\n\n### Almost half of commercial code is now AI-generated. Code churn has doubled and the engineers who built these systems can't always explain how they work anymore.\n\n### The code looks fine. That is the problem.\n\nClean PR. Tests green. Knot in your stomach.\n\nYou read the diff. Properly indented. Variables make sense. Reads like a dream.\n\nThen you try to trace what it actually does from start to finish.\n\n**You have no idea.**\n\nThat is software development in 2026.\n\n[AI coding assistants](https://www.inc.com/joe-procopio/anthropics-ceo-said-all-code-will-be-ai-generated-in-a-year/91163367) gave us a strange new superpower: we can now generate more correct-looking code than our brains can comprehend.\n\nThe scariest problems do not crash the app. They do not throw errors. They slip past review because the code is beautiful.\n\nWe are flying blind. Trusting surface signals. Losing our grip on how our own systems work.\n\nA quiet crisis.\n\n📢 *A quick word before we get into it.*\n\nThe article is about code you cannot fully see. The same problem is now hitting production access.\n\n**Mercari migrated their in-house JIT access system to Opal.** Same zero-touch posture. Zero KTLO. The real reason:\n\n[AI agents are next](https://www.opal.dev/customers/mercari/?utm_source=ai-corner&utm_medium=cpc&utm_campaign=nhi&utm_term=ai-access&utm_content=primary&hstk_campaign=&hstk_network=ai-corner&hsa_acc=45127704&hsa_cam=&hsa_net=ai-corner).Every agent now governed like a human identity. Least agency enforced. Circuit breakers in place. Rogue agents isolated. One identity model for humans and agents.\n\nIf AI writing the code keeps you up at night, AI calling your [production APIs](https://www.opal.dev/customers/mercari/?utm_source=ai-corner&utm_medium=cpc&utm_campaign=nhi&utm_term=ai-access&utm_content=primary&hstk_campaign=&hstk_network=ai-corner&hsa_acc=45127704&hsa_cam=&hsa_net=ai-corner) probably should too.\n\n## Table of Contents\n\n1. Faster Doesn’t Mean Better, But We Keep Forgetting That\n\n2. The Debt You Can’t See Grows the Quickest\n\n3. Why Code Review Is Failing When It Matters Most\n\n4. The Rational Trap: Smart Individual Decisions, Disastrous Collective Outcomes\n\n5. What Governance Actually Looks Like When It Works\n\n6. The Uncomfortable Truth About What This Means for Your Team\n\n## 1. Faster Doesn’t Mean Better, But We Keep Forgetting That\n\nFor a while, this is **easy** to ignore.\n\nOver time, it becomes **impossible** to pretend it isn’t there. There’s a real tension at the heart of AI-assisted development that **most teams** haven’t fully sat with yet.\n\nThe [experience of going faster](https://www.gitclear.com/coding_on_copilot_data_shows_ais_downward_pressure_on_code_quality) and the reality of slowing down are not mutually **exclusive**. These tools have made both possible at once.\n\nMost teams are only tracking **one** of them.\n\n### Where the time actually goes\n\nWriting code has never been the **bottleneck** in professional software development. **Understanding** it is. **Debugging** it is. So is **modifying code** whose reasoning you can’t fully trace.\n\n**AI** has made the fast part of development faster while making the slow parts measurably harder.\n\nThe **numbers** make this concrete.\n\nDevelopers consistently report feeling around **20%** **faster** with AI tools. Measured end to end, through review, integration and production fixes, teams frequently land roughly **19%** **slower**.\n\nThat gap stops being surprising once you trace where the time actually goes. [Generating code](https://www.the-ai-corner.com/p/ai-coding-tools-complete-guide-2026) has become **cheaper**. Verifying and understanding it has become **more demanding**, because there is simply more of it moving through the system at once.\n\n### The planning problem nobody talks about\n\nThe deeper issue is what that gap does to how teams make **decisions**.\n\nYou **feel** faster, so you plan around output.\n\nDeadlines **tighten**, backlogs look **manageable**, stakeholders see the commit velocity and feel good about it.\n\nBut the **constraint** doesn’t disappear. It moves into parts of the process that are harder to accelerate and easier to **underestimate**. The long review discussions, [the bugs that take three times as long to isolate](https://uplevelteam.com/blog/ai-for-developer-productivity) because the surrounding logic is **unfamiliar**, the features that keep spilling into the **next sprint.**\n\n## 2. The Debt You Can’t See Grows the Quickest\n\nThere’s a point where the issue **stops** being about speed and starts being about something harder to name.\n\nThe system keeps **growing**.\n\nThe code looks clean. Nothing obvious signals that anything is wrong. But if you ask a simple question, “*who on this team fully understands this part of the system end to end*“ the answers get vague **very quickly**.\n\n### What comprehension debt actually looks like\n\nGoogle engineer **Addy Osmani** called this [comprehension debt](https://addyosmani.com/blog/comprehension-debt/) the growing gap between how much code exists and how much anyone genuinely understands.\n\nUnlike traditional technical debt, which announces itself through slow **builds** and **friction** you can feel, comprehension debt breeds false confidence.\n\nThe system **moves**. Tests **pass.**\n\n**Velocity** looks fine.\n\nThe problem is that **movement** is happening on top of a layer fewer and fewer people can actually speak to.\n\nResearcher ** Margaret-Anne Storey** documented this with a student team that hit a wall seven weeks in, not because the code was messy, but because nobody could explain why decisions had been made or how the system was supposed to\n\n**fit together**.\n\nThe shared **theory** of the software had evaporated.\n\nMargaret-Anne Storey’s [Socio-Technical Model](https://www.researchgate.net/figure/Socio-Technical-Model-for-Collaborative-Software-Development-including-the-a-societys_fig1_313910632) illustrates how software development relies on layers of human interaction. AI tools often disrupt the “Emergence” of understanding from the individual to the team level.\n\n### The 18-month arc\n\nMultiple teams have now reported the same **trajectory**.\n\nThe first **three months** feel like a clear win. The team ships faster, backlogs shrink, the integration of AI tools feels like the right call.\n\nBy months **four through nine**, something **changes**. Reviews take **longer**. Changes cover more ground than expected. The code still gets approved, but reconstructing the intent behind it takes real effort.\n\nAround months **ten to fifteen**, a bug takes longer to fix than it should. The code is readable, but tracing how pieces interact eats time nobody has.\n\nBy month **sixteen or eighteen**, teams start to hesitate. Parts of the codebase feel like territory you approach **carefully**, even when nothing looks obviously wrong.\n\nThat’s when it **becomes clear**. The system is no longer fully legible to the people responsible for it.\n\n**Three patterns** drive this.\n\nAI models **evolve underneath** you, so similar prompts produce inconsistent results over time.\n\nChanges arrive larger and less scrutinized than before. And **AI-generated** code looks competent. Clean, commented, sensibly named. That earns **trust** on appearance rather than understanding.\n\nNone of this shows up on a **dashboard**. The system appears **healthy**. But the ability to safely modify it is quietly **eroding**.\n\n## 3. Why Code Review Is Failing When It Matters Most\n\nCode review gets a bad reputation. It’s slow, it’s sometimes awkward and **nobody** loves having their work picked apart on a Friday afternoon.\n\n**It was never** really about catching bugs. It’s one of the few moments where [one AI engineer](https://www.the-ai-corner.com/p/ai-engineer-roadmap-production-projects-2026) actually gets inside another’s head.\n\nYou start to see how they **think**.\n\nWhy did they made that call and not a different one. **Where** they decided to stop.\n\nDo that enough times, with the same people, on the same codebase and something **you can’t really measure **starts to form.\n\nEveryone just kind of knows how the **system behaves**. Not because it’s documented somewhere.\n\nBecause they’ve lived in it **together**. Lose that and the code doesn’t suddenly break. It just becomes a place [nobody fully owns anymore](https://opensourceready.substack.com/p/death-by-a-thousand-ai-pull-requests).\n\n### The scale problem\n\nThat function depends on a certain scale.\n\n**Reviews work** when a person can reasonably hold the change in their head. Not every **detail**, but enough to trace the intent and spot where something feels off.\n\nWhen that boundary is **crossed**, the nature of the review **changes**. It becomes less about understanding and more about scanning.\n\n**AI-assisted** workflows push against that boundary constantly.\n\nA change that might once have been **50 lines** shows up as several hundred, covering multiple concerns at once.\n\nVerifying it properly means reconstructing how the pieces fit together, not just checking **syntax** or **style**. Most reviewers don’t have the **time**.\n\nWe are already seeing the fallout of this in real-time. Industry outlets like The New Stack recently reported on a growing crisis where engineering teams and open-source maintainers are actively “[drowning in AI-generated code](https://thenewstack.io/ai-generated-code-crisis/).”\n\nSo behavior **adapts**. Reviews become **lighter**.\n\nPeople look for obvious issues, rely on tests as a proxy for correctness and move on.\n\nNothing about this feels irresponsible at the moment. The problem is that the standard for approval **has changed** without anyone explicitly deciding it should.\n\n### The generational consequence\n\n**S trong**\n\n[engineering](https://www.the-ai-corner.com/p/stanford-cs230-ai-engineering-rules-2026)has always been built through friction.\n\n**Write** something,\n\n**watch** it break,\n\n**build** intuition through debugging.\n\nWhen a tool **handles** most of the implementation, that loop **compresses**. The code works often enough that you stop interrogating **why**.\n\nOver time, that reshapes what a team collectively knows.\n\nYou still have people who can produce code quickly. What becomes **less certain** is how many can take a failing system, step through it and find the fix without reaching for the same tools that built it.\n\nCode review **sits** right at that intersection. Under more pressure than ever, at exactly the moment its role in maintaining shared **understanding** matters most.\n\n## 4. The Slow Disaster Nobody Voted For\n\nSo far this probably sounds like a tooling problem. Something to **fix, tweak, manage better**.\n\nBut **that’s not **really what’s going on.\n\nThe **harder truth** is that most of the behaviour driving this makes complete sense.\n\nEveryone is acting **rationally**.\n\nThat’s exactly what makes it so **difficult to stop**.\n\n### The deferral logic\n\nIf you assume that models will keep improving, deferring cleanup starts to look like the sensible choice. **Why spend time** simplifying something today when a better model will be able to read, refactor, or regenerate it more easily in a few months?\n\nThe** cost of waiting** appears to go down over time. So teams push forward.\n\nThey build faster, accept a little more [opacity](https://www.tembo.io/blog/ai-technical-debt) and trust that future tools will **make sense of it when needed**.\n\nEach **individual decision** is easy to justify, shipping now has immediate value and cleaning up later feels like a smaller, more flexible cost.\n\nWhen everyone operates under that **assumption**, the system fills up with code that works but isn’t deeply understood. It doesn’t feel dangerous, because nothing breaks right away.\n\n### The subprime logic applied to software\n\nThis is the structure of the **2008 mortgage crisis** applied to codebases. The risk wasn’t created by obviously reckless decisions.\n\nIt built up because the system rewarded short term gains and **made future costs **look manageable.\n\nEveryone **acted in ways** that made sense locally.\n\nThe **failure came** from how those decisions stacked together over time and from the assumption that conditions would keep moving in a direction that made the exposure feel safe.\n\nThe same structure is playing out here. **Teams optimize** for delivery because that’s what gets measured.\n\nThey rely on improving tools because that has been a **reasonable assumption** so far.\n\nOver time, they **accumulate** a codebase that sits just beyond what they can confidently hold, but not far enough beyond to force a correction.\n\nWhen that **boundary is crossed**, the options narrow quickly. What looked like flexibility earlier turns into constraint. And the cost you thought you were deferring doesn’t disappear.\n\nWhen it **finally shows up**, it doesn’t trickle in. It lands all at once and by then it’s a much [bigger mess](https://www.buildmvpfast.com/blog/ai-generated-code-technical-debt-management-2026) than it ever needed to be.\n\n## 5. What Governance Actually Looks Like When It Works\n\nBy the time teams feel the drag, it’s **already late**.\n\nThe ones that avoid it tend to behave differently much earlier, often before the generated code becomes a** large share** of their workflow.\n\nThey don’t treat generation as the starting point.\n\nThey treat** the system** as the starting point.\n\n### Architecture first, generation second\n\nYou can see this in how **high performing **teams handle architectural decisions.\n\nThere’s usually a record of where the system is headed, often captured as [architecture decision records](https://continuous-architecture.org/practices/architecture-decision-records/?utm_source=chatgpt.com)\n\nNot a thick document nobody reads, but **actual decisions** that got written down and get checked on.\n\nWhen new code comes in, whoever wrote it, whether a person or a model, it’s **expected** to fit that direction. Someone senior owns that.\n\nThey **hold the line**.\n\n**Most** teams don’t work this way.\n\n**Architectural ownership** tends to be everyone’s responsibility, which usually means it’s nobody’s.\n\n### Tiered review, not uniform scrutiny\n\nReview works differently in these teams. Not every change gets treated the same, but the **difference is explicit**.\n\nSmall, low risk changes move **quickly**.\n\nLarger changes, especially those that introduce new patterns or touch critical paths, are **slowed down **on purpose.\n\nWhat matters is not just whether the code runs, but whether the person who wrote it or approved it **can explain how it works** and why it was structured that way.\n\nThat **expectation** shows up in subtle ways, a pull request is less likely to move forward if the author cannot walk through the logic, even if the tests pass.\n\nAnd the logic is pretty **simple**.\n\nAI is great at handling the **boring, repetitive **stuff.\n\nSo let it.\n\nThe time you save there goes back into **actually understandin** g the code that matters.\n\nThe code that, if it breaks, hurts.\n\nThe net cost is roughly neutral. The **risk profile is no** t.\n\n### Measuring what actually matters\n\nThese teams also watch **different** numbers.\n\nVelocity exists, but it’s **not the only** signal that matters.\n\nCode churn, which is how often recently merged code needs to be substantially revised, is a reliable early indicator that comprehension debt is building.\n\n**How long** it takes to fix a bug in AI generated code tells you a lot.\n\nIf it’s taking forever, chances are **nobody really understood** that code in the first place.\n\nNeither of these metrics is exotic.\n\nMost teams just aren’t tracking [them](https://www.coderabbit.ai/blog/2025-was-the-year-of-ai-speed-2026-will-be-the-year-of-ai-quality).\n\nThe result, counterintuitively, is that these teams end up moving faster over time. Because the system remains **understandable**, changes carry less hidden risk.\n\nEngineers spend less time rediscovering how things work.\n\n**Decisions hold** instead of being undone.\n\n## 6. The Uncomfortable Truth About What This Means for Your Team\n\nThese tools ** amplify** whatever is already there.\n\nStrong foundations, clear **ownership**, a team that actually understands how the pieces fit together. All of it gets faster and better.\n\nBut if those foundations were **shaky** to begin with, the same acceleration just pushes more code into the places that were already hard to work through.\n\nThe gaps don’t stay the same **size**.\n\nThey grow.\n\nDecisions that were slightly unclear become harder to **untangle**.\n\nThings that could have been sorted out early start to feel ** permanent**.\n\n### Judgment matters more, not less\n\nOver time, one thing becomes clear. Your team’s **judgment** has never mattered more.\n\nBecause the leverage on that judgment has never been **higher**.\n\nEvery decision about what to generate, what to accept and what to revisit carries more **weight** than it used to.\n\nA mistake that once stayed **local** can now spread through the entire system before anyone realises what happened.\n\nThe tools, used carefully, deliver real **value**.\n\nBut the teams that hold onto that value over time are the ones who stay in control of what the code actually **means**. Not just how much of it **exists**.\n\n### The gap that doesn’t announce itself\n\nFor a while, everything still looks like **progress**.\n\nThe codebase grows, features ship, the **metrics** look fine. But the real gap, how much of the system anyone can actually **account** for, stays hidden.\n\nIt only becomes visible when something forces the **question**.\n\nBy then, the options are already **narrower**.\n\nThe tools did exactly what they were supposed to **do**. What built up quietly in the background was the cost of **treating** **understanding** as something that could always wait.\n\n**But it couldn’t.**", "url": "https://wpnews.pro/news/we-taught-ai-to-write-code-but-we-forgot-to-teach-it-to-think", "canonical_source": "https://www.the-ai-corner.com/p/ai-technical-debt-the-code-looks-fine", "published_at": "2026-05-14 15:05:35+00:00", "updated_at": "2026-05-25 16:40:16.259009+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-safety", "ai-tools", "ai-agents", "large-language-models"], "entities": ["Opal", "Mercari", "Anthropic"], "alternates": {"html": "https://wpnews.pro/news/we-taught-ai-to-write-code-but-we-forgot-to-teach-it-to-think", "markdown": "https://wpnews.pro/news/we-taught-ai-to-write-code-but-we-forgot-to-teach-it-to-think.md", "text": "https://wpnews.pro/news/we-taught-ai-to-write-code-but-we-forgot-to-teach-it-to-think.txt", "jsonld": "https://wpnews.pro/news/we-taught-ai-to-write-code-but-we-forgot-to-teach-it-to-think.jsonld"}}