{"slug": "why-ai-agents-cannot-change-software-systems", "title": "Why AI Agents Cannot Change Software Systems", "summary": "Current large language models cannot safely modify real software systems despite impressive code-generation demos, because they rely on pattern matching rather than causal reasoning. The fundamental gap lies in the distinction between additive tasks like writing new code and transformative tasks like modifying existing systems, which require understanding dependencies, invariants, and downstream consequences. While AI agents can assist with software delivery, they remain unreliable, non-autonomous, and unsafe for production use on real-world codebases.", "body_md": "This article explains why current LLMs cannot safely modify real software systems, despite impressive code‑generation demos.\n\n# The Promise of Automated Software Delivery\n\nIn 2026, the automated software delivery dream is for an agent to:\n\n- read a repository\n- understand project structure\n- plan a multi‑step change\n- write code, tests, and docs\n- run the code and fix its own mistakes\n- produce a PR‑ready diff\n\nThe first three tasks are additive; the last three are transformative. The first three add information without changing the behaviour of the system: they require reading, mapping, and planning, but not altering any existing causal structure in the codebase.\n\nApplying new code is self-contained, additive work; modifying an existing system is transformative work that requires an understanding of dependencies, invariants, and consequences. This distinction — additive vs transformative — is the core reason current LLMs can assist but cannot autonomously deliver software.\n\nParts of the above can be done but only for tightly controlled demos on simple code that is tens of lines long, not on real-world repositories with thousands of lines of code that has existed for years where dozens of people have updated it.\n\n# What the Labs Have Actually Delivered\n\nThe agentic work of OpenAI, Google, Cognition Labs, GitHub (Microsoft),\nSourcegraph, JetBrains, Replit, Amazon, Meta, and Anthropic, that is listed in\n[Further Reading](#further_reading), was published in 2023 and 2024.\n\nDepending on where you look, you may have been given another impression: that \"agents are here\". However, reality tells a different story.\n\nAgents are improving, but are not reliable, not autonomous, and not production‑safe.\n\nLLMs can assist with software delivery, but they cannot own it.\n\n# Why is this?\n\nLLMs generate statistically plausible continuations of text. This works well for self-contained tasks like writing a function or drafting documentation because these are pattern‑extension problems. But pattern‑matching is not system understanding, and plausibility is not correctness.\n\nSoftware systems are causal: components depend on each other, invariants constrain behaviour, and changes propagate through the system. The moment a task stops being self‑contained and becomes system‑dependent — requiring dependency coherence, persistent state, or awareness of how changes ripple through a real codebase — pattern‑matching is no longer sufficient.\n\nCurrently, LLMs can imitate the shape of engineering work, but they cannot maintain a stable internal representation of a system that must be coherently changed, and that gap is exactly why LLMs fail the moment the task becomes system‑level.\n\n# Persistent state creates temporal dependencies\n\nA self‑contained task has no past and no future. A system‑dependent task does.\n\nAs soon as a change depends on:\n\n- previous writes\n- accumulated data\n- cached values\n- long‑lived objects\n- external system state\n\nany agentic model must reason about how the system got here and how it will behave after the change.\n\nLLMs cannot maintain that internal causal chain.\n\n# Writing code to Agentic Systems: The Fundamental Gap\n\nThe gap becomes clear when you compare two activities: writing new code and modifying an existing system.\n\nCode generation is local and additive: the model extends a pattern without needing to understand the system.\n\nBut agentic work is global and transformative: the LLM must change the system itself, which requires understanding dependencies, invariants, interactions, and downstream consequences.\n\nThis is causal reasoning, not pattern extension. LLMs predict tokens, not consequences — and that is why the leap from writing code to producing a safe, system‑aware PR‑ready diff is not incremental but a shift into a fundamentally different problem space.\n\n# Producing a PR‑ready diff (the section in question)\n\nA pull request (PR) is a piece of code that will change a system.\n\nFor that change to be safe, the change must respect the system's current architecture, its intent, and all downstream consequences.\n\nSoftware engineers work hard to ensure that such a change is safe through testing and their own judgement and experience before having a collegue review the change.\n\nApplying a change is no longer pattern-matching but understanding causal behaviour: how will the system change if this PR is applied?\n\nThe correctness of the PR depends on understanding the whole system, not just generating text.\n\nThe LLM must change the system, which requires understanding dependencies, invariants, interactions and consequences, all of which demand causal reasoning, not pattern matching.\n\nPattern‑matching can write code; only causal reasoning can maintain systems.\n\n# What can I do?\n\nConfirm for yourself any claim that you see. Define your own *realistic*\nreal-world repository to work on, one that is thousands of lines of code, that\nhas supported past real-world work patterns.\n\nHaving your own results, applied to your own repository will tell you volumes more than any press release or online anecdote.\n\nFor the moment:\n\n- treat agentic AI as a strategic direction\n- treat current tools as assistants, not engineers\n- invest in clarity, architecture, and test discipline\n- expect progress, but not miracles\n- do not plan delivery pipelines around unproven capabilities\n\nMaintain human judgement as the centre of the system.\n\nThe dream is intact. The evidence is not yet here.\n\n# Why this matters: code is cheap, judgement is not\n\nLLM-augmented software delivery does not remove engineering.\n\nIt moves engineering up a level.\n\nHumans need to focus on:\n\n- intent\n- constraints\n- architecture\n- correctness\n- safety\n- trade‑offs\n\nThe desired end state is not \"AI writes code\" but AI maintains systems. If we get there, humans will still need to maintain intent.\n\nThe consequence of an agentic system is not to *remove* engineering, but to\n*elevate* it, so that teams spend less time on mechanical construction and more time on\njudgement, alignment, and shaping the environment in which agents operate.\n\nThe organisations that benefit most will be those that treat agentic development not as automation, but as a structural shift in how software is conceived, validated, and maintained.\n\n# Final Thought\n\nUntil AI can reason causally about systems, human judgement remains the foundation of software delivery.\n\n# Related Work\n\n[The real gains from AI come from improving the shared work between engineers — planning, coordination, review, debugging, and delivery — not from speeding up individual coding.](ai-engineering-team-based-ai.html)[Software engineers must understand tokens, structure, and probabilistic behaviour to build reliable systems and avoid mismatches between test and production behaviour.](engineers-need-to-know.html)[AI systems behave like probabilistic components; engineers must build structured interfaces and layered constraints to make them reliable inside software systems.](surface-area.html)\n\n**If this piece was useful**, you’ll appreciate the free Phroneses newsletter — clear thinking on engineering leadership, organisational clarity, and reliable systems. Practical, honest, and built for people who care about doing the work well.\n\nI work with leaders and teams on clarity, capability, and momentum.\n[Work with me →](/pages/services.html)\n\n# Table of Contents\n\n[The Promise of Automated Software Delivery](#the-promise-of-automated-software-delivery)[What the Labs Have Actually Delivered](#what-the-labs-have-actually-delivered)[Why is this?](#why-is-this)[Persistent state creates temporal dependencies](#persistent-state-creates-temporal-dependencies)[Writing code to Agentic Systems: The Fundamental Gap](#writing-code-to-agentic-systems-the-fundamental-gap)[Producing a PR‑ready diff (the section in question)](#producing-a-prready-diff-the-section-in-question)[What can I do?](#what-can-i-do)[Why this matters: code is cheap, judgement is not](#why-this-matters-code-is-cheap-judgement-is-not)[Final Thought](#final-thought)[Related Work](#related-work)[Table of Contents](#table-of-contents)[Further Reading](#further-reading)\n\n# Further Reading\n\n**OpenAI o1/o3**, OpenAI, September, 2024\n\n- https://openai.com/index/introducing-openai-o1-preview/\n\n**Gemini Code Demos**, Google, December, 2023\n\n- https://blog.google/technology/ai/google-gemini-ai/\n\n**Devin**, Cognition Labs, March, 2024\n\n- https://www.cognition-labs.com/\n\n**GitHub Copilot**, GitHub (Microsoft), November, 2023\n\n- https://github.blog/2023-11-08-the-new-github-copilot-your-ai-pair-programmer/\n\n**Cody**, Sourcegraph, April, 2024\n\n- https://sourcegraph.com/blog/cody-2-0\n\n**AI Assistant in JetBrains IDEs**, JetBrains, December, 2023\n\n- https://blog.jetbrains.com/blog/2023/12/06/jetbrains-ai-assistant-is-now-available/\n\n**Replit Agents**, Replit, November, 2023\n\n- https://blog.replit.com/agents\n\n**Amazon CodeWhisperer**, Amazon, April, 2023\n\n- https://aws.amazon.com/codewhisperer/\n\n**Code Llama**, Meta, August, 2023\n\n- https://ai.meta.com/blog/code-llama-large-language-model-coding/\n\n**Claude 3 Code Reasoning**, Anthropic, March, 2024\n\n- https://www.anthropic.com/news/claude-3-family", "url": "https://wpnews.pro/news/why-ai-agents-cannot-change-software-systems", "canonical_source": "https://phroneses.com/articles/build/notes/agents-cannot-maintain-systems.html", "published_at": "2026-05-27 13:46:38+00:00", "updated_at": "2026-05-27 14:11:19.071998+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-research", "ai-safety"], "entities": ["OpenAI", "Google", "Cognition Labs", "GitHub", "Microsoft", "Sourcegraph", "JetBrains", "Replit"], "alternates": {"html": "https://wpnews.pro/news/why-ai-agents-cannot-change-software-systems", "markdown": "https://wpnews.pro/news/why-ai-agents-cannot-change-software-systems.md", "text": "https://wpnews.pro/news/why-ai-agents-cannot-change-software-systems.txt", "jsonld": "https://wpnews.pro/news/why-ai-agents-cannot-change-software-systems.jsonld"}}