{"slug": "shipping-enterprise-quality-code-with-ai-agents", "title": "Shipping enterprise-quality code with AI agents", "summary": "Sonar reports that AI agents generate bloated, unmaintainable code, citing a Carnegie Mellon study of 807 open-source projects using Cursor that found a 30% rise in static analysis warnings and 41% increase in code complexity after three months, eroding initial velocity gains. Sonar's LLM Leaderboard shows GPT-5.4 High produced 1,159,000 lines for an 81.05% pass rate, while Claude Opus 4.7 Thinking used 336,000 lines for 82.52%, highlighting bloat variation. Sonar proposes the Agent Centric Development Cycle (AC/DC) of guide, verify, and solve to close the quality gap.", "body_md": "Developers are caught between the joy — or pressure — of using agents to ship 10x faster today and the dread of how they will maintain that code tomorrow. The gap between [“vibe” code](https://www.infoworld.com/article/4078884/what-is-vibe-coding-ai-writes-the-code-so-developers-can-think-big.html) and code that can be deployed to millions of users is vast and easy to underestimate. Closing the gap requires care, expertise, and effort, with the payoff coming later. Agents are able to complete increasingly complex programming tasks but without the quality we need. What’s missing, and how can we fill the gap?\n\nSonar\n\nEnterprise code has to clear three bars: it must be maintainable, reliable, and secure. Out-of-the-box AI agents can miss all three. Let’s focus on the biggest and most visible maintainability issue, which is bloat: redundant validation, defensive checks that cannot fire, near-duplicate functions, dead code that nothing removes. A `None`\n\ncheck on a parameter typed as `dict`\n\n. A `try`\n\n/`except`\n\naround a call that never throws. Two functions, identical except for the negation in their return statement.\n\nBloat varies dramatically by model. Sonar’s [LLM Leaderboard](https://www.sonarsource.com/the-coding-personalities-of-leading-llms/leaderboard/) runs every frontier model through 4,400+ Java tasks and analyses the code generated. To complete the benchmark, GPT-5.4 High generated 1,159,000 lines of code at an 81.05% pass rate, while Claude Opus 4.7 Thinking generated only 336,000 lines of code to return a better than 82.52% pass rate. Different models generate dramatically different code to achieve similar outcomes.\n\nBloat is not just messy. [Carnegie Mellon researchers studied](https://arxiv.org/abs/2511.04427) 807 open-source projects that had adopted Cursor, matched against 1,380 controls, measured by SonarQube. A short-term velocity gain disappeared by month three, while static analysis warnings rose 30% and code complexity rose 41% — both persistent. The harder it became to change the codebase and the more bugs it contained, the more the velocity was dragged down. Any experienced developer will know how this goes: quality problems compound until the code feels impossible to change and the only option is the dreaded rewrite.\n\nThree forces produce bloat once a model is in use:\n\nWhat closes the gap is a loop around each iteration of agent work. The agent does what it is good at — generating code — and our job is to wrap that with three steps the agent cannot reliably do on its own. At Sonar we call this the Agent Centric Development Cycle, or AC/DC: guide, verify, solve.\n\nMany teams overcorrect on context. They paste the style guide, three years of architectural decisions, and the entire onboarding doc into the agent’s instructions and expect output to improve. [ETH Zurich researchers tested](https://arxiv.org/abs/2602.11988) this and found the opposite: large context files often reduced task success against no context at all, and added 20% or more to inference cost.\n\nKeep agent-facing context short — under 200 lines is a useful heuristic — and restrict it to fundamentals that can’t easily be inferred from the code: naming conventions, architectural invariants, what has been tried and failed. However, this will only get you so far, so make sure to provide specific context for each task. If you have architectural expectations, don’t expect the agent to guess them. Software architecture tools can be used to provide additional context in the guide phase.\n\nTask shape matters too. Break the work into steps and agree on a plan; ask the agent to provide three solutions and evaluate the impact on quality of each. There is no perfect software architecture, and you understand the trade-offs in your codebase best, so think critically about the changes before they happen. Without this, the agent will confidently pick an option, seemingly at random, and the further it goes the harder it is to “unpick.” If you want to test this, ask three instances of your preferred agent to complete a task that involves some polymorphism and watch each one confidently suggest a different solution.\n\nThe most expensive verification mistake is doing it last. Reviewing 200-line pull requests (PRs) after the agent is done is the dynamic behind the [Faros/DORA figures Addy Osmani highlighted](https://addyo.substack.com/p/the-80-problem-in-agentic-coding): 98% more PRs merged in high-adoption teams, review times up 91%. Verification inside the loop is different. Unit test runs, static analysis, and security scanners produce output the agent can act on. This is where AI-native tooling belongs: purpose-built for the agent to invoke, not just for humans to consult through a UI.\n\nHuman reviewers cannot keep up. When agents merge twice as many PRs per week and each one takes nearly twice as long to review, doubling the review staff still leaves you behind. Automated verification is the only response that scales. Fast feedback has always been a fundamental tenet of good software engineering. Feeding it directly back to the agent protects the developer from simple mistakes and leaves them headroom to work on the harder ones.\n\nIf verification happens within the agentic loop, the agent can fix any issues whilst the code is being generated without expensive remediation steps. Static analysis tools can guide the agent on how to resolve the issue quickly. Some cases need human judgment — a `None`\n\ncheck at a system boundary may document a real precondition. But most of the work is mechanical. Automate the obvious fixes and let engineers spend their attention on the cases that are not.\n\nBetter models will keep arriving. They may not change the mechanism of bloat or the dynamics of compounding decay. The loop is what does — bounded tasks, sharp context, in-loop verification, and a deliberate “solve” step to clear bloat before it accumulates.\n\nThe same logic governs how autonomy should expand. Reduce human interventions only when the agent’s guide, verify, and solve cycle is making them redundant. Our biases can sting us here: an agent’s ability to write code can lead us to agree with it more than we should. Don’t trust blindly; wait for the evidence.\n\nThe teams that will be shipping enterprise-quality code with AI agents in 18 months are not the ones running the “best model.” They are the ones treating workflow as the engineering investment, with the seriousness once given to build systems and CI. The model is the tool, the workflow is the discipline. That is where the durable advantage compounds.\n\n*—*\n\n*New Tech Forum*** provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all ****inquiries to *** doug_dineley@foundryco.com***.**", "url": "https://wpnews.pro/news/shipping-enterprise-quality-code-with-ai-agents", "canonical_source": "https://www.infoworld.com/article/4182518/shipping-enterprise-quality-code-with-ai-agents.html", "published_at": "2026-06-16 09:00:00+00:00", "updated_at": "2026-06-16 09:20:54.169233+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "ai-safety", "ai-research"], "entities": ["Sonar", "SonarQube", "GPT-5.4 High", "Claude Opus 4.7 Thinking", "Cursor", "Carnegie Mellon", "ETH Zurich"], "alternates": {"html": "https://wpnews.pro/news/shipping-enterprise-quality-code-with-ai-agents", "markdown": "https://wpnews.pro/news/shipping-enterprise-quality-code-with-ai-agents.md", "text": "https://wpnews.pro/news/shipping-enterprise-quality-code-with-ai-agents.txt", "jsonld": "https://wpnews.pro/news/shipping-enterprise-quality-code-with-ai-agents.jsonld"}}