{"slug": "ai-agent-governance-10-takeaways-from-engineering-leaders-on-agentic-development", "title": "AI Agent Governance: 10 Takeaways from Engineering Leaders on Agentic Development", "summary": "At AI Native DevCon London, engineering leaders from various organizations discussed the shift from individual AI coding tools to enterprise-wide agent governance. The roundtables highlighted that while initial adoption focuses on personal productivity, scaling agentic development requires systems for safety, repeatability, and cost control. Key takeaways include managing fragmentation from bottom-up experimentation and avoiding overhyping productivity to prevent resistance.", "body_md": "Agentic development starts as a productivity story, but at scale it quickly becomes a governance problem.\n\nAt [AI Native DevCon](https://tessl.io/devcon) London, we hosted a set of Chatham House roundtables with senior engineering leaders from a range of organizations. I won’t attribute comments to individuals or companies, but the patterns were strikingly consistent: agentic development is moving from an individual tooling conversation into an enterprise operating model question.\n\nThe first wave was familiar enough: devs tried GitHub Copilot, Cursor, Claude Code, Codex, Devin and similar tools, and many found obvious value. They wrote code faster, produced tests faster, explored ideas faster, and in some cases revived work that had been sitting in the backlog because it was too costly to attempt.\n\nThe interesting question is what happens once agents stop being a personal accelerator and start touching the way an engineering organization works. At that point, the problem shifts from “does the tool help?” to “can we make this safe, repeatable, measurable, and economically sane?”\n\nThat shift is why I think the most useful frame is **AI agent governance**. It means the systems that let teams move faster without losing control, including identity, permissions, context, evals, model routing, cost visibility, policy, ownership, and feedback loops.\n\nOn a side note, you can hear my talk “skills are the new code”, where I share my personal framework towards agent governance and a proposed solution towards enterprise agent enablement.\n\nLet’s now look at the 10 main takeaways from our roundtable.\n\nMost organizations seem to start the same way: give developers access to AI coding tools and let the motivated teams run.\n\nThis is the right instinct at the start, because the space is moving too quickly for a purely top-down programme to discover all the useful patterns. Bottom-up energy creates learning quickly. It also surfaces where agents are genuinely useful, rather than where a transformation deck hoped they might be.\n\nBut it also creates fragmentation.\n\nDifferent teams adopt different tools, build different prompts, store skills in different repos, and develop different assumptions about what is safe enough to automate. One group may use agents for test generation, another for code review, another for product specs, another for deployment automation. Before long, the organization can have dozens of useful experiments that don’t yet add up to a system.\n\nThe trick is not to kill the experimentation but to create a path from local learning to shared practice.\n\nThe first wave of adoption was mostly about individual productivity. The next wave has to be about repeatable, governed team workflows. That means rollout phases, clear ownership, a view of which tools are approved for which classes of work, and a way to convert the best local experiments into standards others can reuse.\n\nThis is a familiar pattern from cloud and DevOps: the early adopters prove what is possible, then the platform forms around them. The difference this time is that the cycle is much faster, and the unit being governed is not just infrastructure or code, but the agentic workflow itself.\n\nA lot of the public conversation around AI in software development is still framed around productivity.\n\nMany business leaders will look for savings, and it would be naive to pretend otherwise. It is also worth acknowledging that some of this is hard to say openly in a group setting, however intimate. In practice, some leaders will seek to capitalize on productivity by doing the same work with fewer people, reducing costs, or slowing future hiring.\n\nBut the roundtables reinforced a concern I have had for a while: if we hype AI productivity too aggressively, we may slow adoption by making people fear what adoption means.\n\nIf the internal narrative is mostly about headcount reduction, people will defend themselves. They may hide the real gains, avoid showing how much faster a workflow became, or keep their best agent patterns private because sharing them feels like making the case for fewer people.\n\nThat is not a cultural foundation for transformation. A better frame is *ambition.*\n\nAgents make prototypes cheaper. They let senior engineers explore ideas that have been trapped behind calendar time. They change the build-versus-buy equation, because a capability that once required an RFP and a vendor project may now be plausible for a small internal team to try.\n\nThis is the version of the story that leaders should emphasize publicly and internally. The question should be “what can we now attempt that we previously would not have attempted?”\n\nThat framing does not deny the economics but it does point them in a healthier direction. The long-term narrative should not be about lowering the floor, but about raising the ceiling. If AI is understood as a way to increase ambition rather than quietly reduce capacity, more people will lean in, and the organization is more likely to discover the compounding benefits.\n\nAgents are only as useful as the context they can apply.\n\nThat context includes specs, tests, policies, architecture guidance, product requirements, runbooks, coding conventions, incident patterns, security rules, and domain language. Most organizations already have some of this knowledge, but it is rarely as clean or discoverable as the agentic era requires. Some of it lives in docs, some in Slack, some in tickets, some in code comments, and a great deal of it lives in people’s heads.\n\nIn the pre-agent world, weak documentation was annoying but survivable. A dev could ask the person who knew the system, or learn the convention through review comments. In the agentic world, missing context becomes a direct limit on what the agent can do.\n\nThis is why skills matter.\n\nSkills turn tacit engineering knowledge into reusable context that agents can apply. They are not just prompts with nicer packaging; they are a way to encode how an organization wants work done, from API usage to security checks to writing style to deployment workflow.\n\nThis is also where Tessl’s view of agentic development comes in. If agents are going to participate across the SDLC, organizations need a way to collaboratively develop, discover, evaluate, and improve the context those agents rely on. Skills and evals are two sides of that problem: skills package the knowledge agents need, while evals show whether that knowledge actually improved the outcome.\n\nOnce you see context this way, and move the mental framework from SDLC → CDLC (Context Development Lifecyle illustrated above), documentation stops being a hygiene task and becomes infrastructure. The teams that write down how they work, keep that knowledge current, and make it available to agents will have a structural advantage over teams that treat context as tribal knowledge.\n\nModel costs are becoming real.\n\nIn the earliest adoption phase, many teams did not feel the cost directly. Usage was limited, pilots were small, and in some cases vendor pricing or subsidies made the economics look less material than they would eventually become. But that phase is ending…\n\nAs agents become part of daily development, cost shows up in more places: large context windows, repeated attempts, long-running tasks, model upgrades, autonomous workflows, and agents that call other tools in loops.\n\nA prompt that is cheap as a one-off experiment can become expensive when it runs across hundreds of devs every day, each with a large repo context, multiple retries, and a frontier model selected by default.\n\nThis is why *AI FinOps* needs to become a real discipline!\n\nThe cloud analogy is useful (but only up to a point). In cloud, cost followed infrastructure usage. In AI, cost follows cognition-like work: reasoning, context, retries, tool calls, evals, and orchestration. That makes it harder to map spend to value, because the bill may be attached to a workflow that saved a week of engineering time, avoided a security incident, accelerated a customer feature, or simply produced three bad attempts before a human rewrote it.\n\nEven in the few weeks since these roundtables took place, awareness of AI costs has increased substantially. That will continue as agent adoption broadens. Leaders will need visibility into where spend goes, which models are used for which tasks, where context is being wasted, and which workflows justify their cost because they improve delivery, quality, risk, or ambition.\n\nThe wrong answer is to suppress usage blindly. The better answer is to manage it deliberately: model routing, caching, context discipline, budgets, observability, and evals that help teams know whether cheaper options are good enough.\n\nThere was broad agreement that not every task should use the largest or most expensive frontier model. A good example is how we’ve recently switched Tessl’s default eval model from Sonnet 4.6 to GLM 5.1. The principle is easy to accept, but the operational question is harder: how does an organization know which model is good enough for which job?\n\nThe answer will not be one model - it will be *routing*.\n\nFrontier models will remain valuable for ambiguous reasoning, complex planning, and tasks where the cost of a poor answer is high. Smaller models may be better for bounded, repeatable work where the task is well specified and the output can be validated. Open models have become capable enough that, for many narrow tasks, they may be more than sufficient and much cheaper. Local or private deployments may make sense when data sensitivity, latency, or control matters more than raw capability.\n\nThe risk is that every team solves this independently. One team standardises on Claude Code, another on Cursor, another on Codex, another experiments with open models, and the organization ends up with duplicated eval work and no shared view of quality, cost, or risk.\n\nThis is why model routing belongs inside AI agent governance. The decision should depend on the task, the data, the quality bar, the blast radius, the cost, and the validation available. The real capability is not choosing a favorite model; it is building the measurement and routing layer that lets teams use the right model for the right task.\n\nThe important test is not whether a smaller model works once. It is whether it meets the quality bar repeatedly under realistic inputs, with the context and constraints the workflow will actually have in production.\n\nCost is rising, but security is still the concern most likely to limit enterprise adoption.\n\nThe risks are easy to understand once you stop thinking about agents as chatbots and start thinking about them as actors inside the development environment. A coding agent running with a developer’s credentials may be able to access internal repositories, package registries, logs, deployment systems, tickets, customer data, and production-adjacent systems. If that agent can browse the web, install packages, execute scripts, or move data between systems, the blast radius changes materially.\n\nThis does not mean the right answer is to block agents. It means the trust model has to mature.\n\nOne useful mental model from the roundtables was to treat agents like new employees or interns. You would not give an intern every credential and full production access on day one. You would start with a defined scope, observe their work, review their decisions, and expand trust over time. Agents need a version of the same path.\n\nThat path includes identity, entitlements, sandboxing, audit trails, tool restrictions, policy enforcement, and incident response. It also includes a decision about whether the agent acts as the human, as a separate identity, or as a constrained delegated identity. Without that, security teams are left with a choice between approving risky autonomy or blocking usage entirely.\n\nThere is also an important cost dynamic here. In many enterprises, security constraints currently limit usage, which means they also shield the organization from the full cost curve. If only a small number of teams can use agents in limited ways, the token bill remains constrained. Once identity, permissions, sandboxing, and audit controls mature, adoption will expand, and costs that were previously hidden by limited rollout will become much more visible.\n\nSo security may be the immediate bottleneck, but cost is waiting behind it.\n\nAgents reduce the cost of implementation, but that does not mean the organization automatically moves faster. It means the bottleneck moves.\n\nIf code becomes cheaper to produce, the relative cost of everything around code increases: product clarity, architecture decisions, security approvals, change management, compliance, release coordination, and cross-team alignment. Several leaders described a version of the same pattern, where teams can now build faster than the organization can decide, approve, or absorb.\n\nThis changes the economics of software delivery.\n\nFor years, engineering organizations optimised heavily against duplication. Build the shared capability once, coordinate across teams, extract commonality, and reuse the platform. That instinct still matters, but the trade-off changes when implementation becomes cheaper and coordination remains expensive. In some cases, duplicating a capability inside a clear domain boundary may be more effective than forcing multiple teams through a shared dependency.\n\nThis is not an argument against architecture. It is an argument for architecture that recognises where the bottleneck has moved.\n\nAgentic development works best when work has clear ownership, limited dependencies, strong tests, and a constrained blast radius. It struggles when success depends on many teams agreeing before anything can move. The practical leadership question is therefore not just “how do we make developers faster?” but “what will become the constraint once they are?”\n\nMost organizations already have controls for software delivery: code review, change management, access approval, security review, compliance checks, deployment gates, incident response, and audit logging.\n\nThe problem is that many of those controls were designed for humans.\n\nThey rely on judgement, institutional memory, informal interpretation, or manual process. People know what the policy really means. Reviewers know when something feels risky. Security teams know which exceptions matter. Auditors accept a workflow because they recognise the human pattern behind it.\n\nAgents force these assumptions into the open.\n\nIf a policy is ambiguous, an agent cannot reliably follow it. If a control depends on a human noticing something subtle, it may not scale. If a process is only documented in training material, it is not agent-ready. If an approval exists mainly so another team can find out what is happening, it may need to be redesigned.\n\nThis is governance debt, and agentic development exposes it.\n\nThe answer is not to invent an entirely new governance model from scratch. It is to make existing controls explicit, automated, and measurable. That means clearer policies, better identity systems, structured workflows, automated checks, traceability across agent actions, and evals that test whether the agent is actually following the standards it was given.\n\nYou cannot govern what you cannot see, and you cannot improve what you cannot evaluate. That is why skills, observability, and evals belong in the same conversation as security.\n\nEvery organization adopting agents faces the same tension: how much freedom should teams have?\n\nToo little standardization creates chaos. Too much standardization too early kills discovery.\n\nThe roundtables surfaced many examples of parallel experimentation: multiple teams creating skills, multiple repositories collecting prompts, different approaches to code review, different rules for test generation, different ideas about how much autonomy is acceptable. Some duplication happened because teams wanted control. Some happened because they did not know someone else had already solved the problem.\n\nEarly duplication is not always bad. It can be how teams learn. It can reveal which patterns work across different environments, and it can create local champions who are credible because they solved a real problem rather than followed a mandate.\n\nBut local learning only becomes organizational advantage if it becomes visible.\n\nThe healthiest pattern is to let teams experiment, make the work discoverable, then converge deliberately. That requires communities of practice, internal demos, shared repos, skill registries, lightweight review processes, and a platform team that sees its job as amplifying the good patterns rather than suppressing all variation.\n\nThe question is not whether to standardise. The question is *when*. Experimentation should be broad while the organization is learning. Production patterns should become intentional once that learning starts to repeat.\n\nAgentic development changes what great engineering looks like.\n\nIt does not remove the need for engineering skill. If anything, judgement becomes more important. But the work shifts from producing every line of code to defining the task, supplying the context, delegating to agents, verifying the output, integrating the result, and knowing when something is subtly wrong.\n\nSome engineers will thrive in that environment. They are comfortable with ambiguity, orchestration, and context switching. They can hold the goal in their head while inspecting partial outputs. They know how to specify, review, and correct without needing to manually produce every detail.\n\nOthers may struggle, especially if their identity is tied primarily to deep, single-threaded implementation or writing every line by hand. That style of work will not disappear, but it will become part of a larger system in which humans increasingly design and supervise the machinery of software creation.\n\nOne analogy that came up in the discussions was the shift from building the furniture to building or operating the factory that builds the furniture. Another is management: working with agents can feel like defining work, delegating it, reviewing the output, and intervening when needed.\n\nThat does not mean every engineer becomes a people manager. It means more engineers will need management-like skills for systems of agents: specification, delegation, verification, feedback, and accountability.\n\nThe emerging role is less “the person who writes all the code” and more “the person who ensures the right system gets built.”\n\n| Blocker | What leaders are seeing | Why it matters |\n|---|---|---|\n| Security | Agents inherit human permissions, touch sensitive systems, browse the web, or act without enough containment. | It limits rollout today, but also defines the trust model for everything that follows. |\n| Cost | Usage grows through larger context windows, repeated runs, frontier models, and always-on workflows. | AI FinOps becomes a durable discipline, not a one-off optimisation project. |\n| Model deployment | Frontier models are powerful, but many enterprise tasks may be better served by smaller, open, or specialised models. | The capability to route work across models becomes more strategic than picking a single model. |\n| Context | Agents need specs, policies, tests, docs, runbooks, examples, and domain language to do useful work reliably. | Context becomes infrastructure, and weak documentation becomes an adoption blocker. |\n| Alignment | Implementation gets cheaper, while decisions, approvals, architecture, and cross-team coordination still move at human speed. | The bottleneck moves from writing code to agreeing what should be built and how it should fit. |\n\nMost of the roundtable discussion reinforced what enterprise leaders already feel: agentic development is useful, the tools are improving quickly, and adoption is uneven.\n\nFrom my perspective, three novel points stood out:\n\nThe next generation of engineering teams won’t be defined by how many agents they use, but by how well they govern them.\n\nAt Tessl, this is the approach we’re building towards: agent governance rooted in context, evaluations, and security. A practical place to start is to point your coding agent at the Tessl CLI and ask it to evaluate your context. It is a simple way to see assess the quality of your context, understand where the gaps are, and think what governance will need to cover next.", "url": "https://wpnews.pro/news/ai-agent-governance-10-takeaways-from-engineering-leaders-on-agentic-development", "canonical_source": "https://dev.to/tessl-io/ai-agent-governance-10-takeaways-from-engineering-leaders-on-agentic-development-4ph0", "published_at": "2026-06-26 08:31:25+00:00", "updated_at": "2026-06-26 09:03:57.696457+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "ai-products"], "entities": ["GitHub Copilot", "Cursor", "Claude Code", "Codex", "Devin", "AI Native DevCon", "Tessl"], "alternates": {"html": "https://wpnews.pro/news/ai-agent-governance-10-takeaways-from-engineering-leaders-on-agentic-development", "markdown": "https://wpnews.pro/news/ai-agent-governance-10-takeaways-from-engineering-leaders-on-agentic-development.md", "text": "https://wpnews.pro/news/ai-agent-governance-10-takeaways-from-engineering-leaders-on-agentic-development.txt", "jsonld": "https://wpnews.pro/news/ai-agent-governance-10-takeaways-from-engineering-leaders-on-agentic-development.jsonld"}}