Discover Broadly, Implement Narrowly

wpnews.pro

I have been building software with coding agents across more than isolated functions and local fixes. The work has increasingly included requirements, architecture, implementation, review, correction and maintenance.

That experience has made me wonder whether many agentic coding workflows are optimising for the wrong finish line.

The usual question is whether the agent completed the task. Did the feature work? Did the tests pass? Did the build succeed? Was existing behaviour preserved?

All of that matters.

But another question keeps intruding: what did completing this task reveal about the architecture of the system?

A change can work perfectly and still expose a deeper problem. It may introduce a second authority path, duplicate lifecycle state, blur a trust boundary, attach evidence to the wrong identity or leave behind code that nobody can confidently repair without returning to another agent.

A passing build proves that the requested change works. It does not prove that the system is becoming safer, simpler, more intelligible or more maintainable.

That distinction matters more as agents produce implementation faster than humans acquire understanding of the resulting system.

What follows is an attempt to sketch a bounded stewardship framework around that problem: one that separates observation from action, classifies what implementation reveals and reserves architectural change for a slower, more deliberate process.

Task completion is not system stewardship

Coding agents are usually given bounded tasks. Add this feature. Fix this bug. Implement this specification. Make the tests pass. Preserve existing behaviour.

Their behaviour is therefore rational. They optimise for task completion and try to avoid unnecessary changes.

In many contexts, that restraint is desirable. Nobody wants an agent to treat every feature request as permission to redesign the repository.

Yet the same restraint can preserve structural weaknesses indefinitely.

An agent may notice that several modules duplicate the same authority rule. It may discover that the object model cannot represent a newly required distinction. It may find that one field is serving simultaneously as draft state, publication state and user visibility. It may realise that a future maintainer will struggle to reconstruct why a particular implementation exists.

Unless the task explicitly authorises architectural review, the agent is usually encouraged to solve the local problem and move on.

The opposite instruction would be just as dangerous. Telling an agent to “always improve the architecture” invites speculative abstractions, opportunistic refactoring, fashionable infrastructure and the repeated reopening of settled decisions.

The answer is not simply to make agents more architectural.

It is to give them wider permission to observe than to act.

Observation authority should be wider than action authority

I find it useful to distinguish between two roles.

The implementer remains tightly bounded. It completes the authorised request, preserves accepted constraints, adds the necessary tests and avoids unrelated refactoring.

The steward is allowed to look more widely. It may ask whether the implementation has revealed a duplicated authority path, a missing trust boundary, lifecycle drift, object-model strain, weak observability, an absent recovery path or evidence that the architecture is working as intended.

The important point is not merely that there are two roles. Software engineering already distinguishes between authors and reviewers, feature work and refactoring, and implementation and architecture.

The more specific claim is that the right to observe should be broader than the right to modify.

Ordinary scope discipline says: do not touch unrelated problems.

Architectural stewardship should say something more demanding: notice relevant problems beyond the immediate change, surface them through a governed channel and do not implement them without separate authority.

The stewardship channel may discover beyond scope. It may not implement beyond scope.

That asymmetry creates room for architectural awareness without turning every coding task into a redesign exercise.

The human burden changes, but does not disappear

A fair objection is that this simply moves the burden.

Instead of asking a human to understand all the generated code, we ask the human to understand the agent’s architectural findings.

That objection is partly correct.

The process does not remove the need for human judgement. It changes the form of the work.

Rather than expecting a person to reconstruct every architectural consequence from a large generated diff, the agent is required to surface a limited set of grounded observations. Each observation should be tied to a known constraint, concrete evidence, a likely consequence, a confidence level, an owner and a condition under which the concern would be shown to be wrong.

The human burden becomes adjudication rather than total reconstruction.

That is still demanding. But it is a more realistic use of scarce attention than expecting every user of an agentic coding tool to notice every hidden architectural consequence unaided.

This framework does not eliminate the comprehension bottleneck. It tries to compress and structure it.

What counts as architectural evidence?

Not every successful test, preference or hypothetical concern should count as architectural evidence.

A useful definition is this:

Architectural evidence is implementation or operational information that materially increases or decreases confidence in an architectural proposition.

A second, materially different workflow successfully reusing the same transition model is evidence.

Two independent changes duplicating the same authority logic are evidence of pressure.

A production incident exposing ambiguous version binding is evidence of contradiction.

“Microservices might scale better someday” is not evidence.

The distinction matters because architecture discussions often blur preference, possibility and demonstrated strain. The aim is not to produce more commentary. It is to identify what implementation has actually taught us.

Architecture review should not become a ritual of finding faults

Once an agent is instructed to look for architectural problems, it may feel obliged to produce them.

That creates performative criticism. Every change generates another warning, another register entry, another proposed abstraction and another future task.

A credible stewardship process must permit a clean result:

No material architectural pressure was detected. The implementation supports the accepted design within the scope exercised by this change. No follow-up is required. This says nothing about weaknesses in untouched parts of the system.

That final qualification matters. A null result should prevent manufactured concern without creating false reassurance.

Implementation evidence may support the architecture, place it under pressure or contradict it.

Confirmation

Confirmation means that implementation evidence supports the current model.

Perhaps one transition mechanism successfully carries another materially different workflow. Perhaps an immutable version supports several forms of delivery without losing identity. Perhaps restricted content is excluded from user output by construction rather than hidden only in the interface.

Confirmation should be recorded selectively.

A passing build is not automatically architectural confirmation. Confirmation becomes worth preserving when it resolves an open architectural question, validates a materially uncertain design choice or demonstrates successful reuse in a genuinely different case.

Over time, such evidence can move architecture from plausible design toward a better-evidenced and more confidently held design.

Pressure

Pressure means that the architecture still works, but implementation is beginning to expose strain.

A second module may duplicate the same validation rules. Two server entry points may begin repeating authorisation logic. A query strategy may remain workable while becoming increasingly expensive to operate.

Pressure should not authorise immediate redesign.

It should create a watchpoint. The observation should state what was seen, which constraint is affected, what would cause the concern to escalate, what the smallest likely intervention might be and what evidence would show that the concern was overstated.

The progression should be gradual:

A local pressure becomes a watchpoint. Repeated evidence may turn it into a systemic finding. Only then does it become a candidate for an explicit architectural decision.

This is close to familiar ideas such as YAGNI and the rule of three, but made durable across agent sessions and implementation cycles.

Pressure should escalate only when additional evidence appears. That might include recurrence across independent changes, rising maintenance cost, divergence in policy, a temporary workaround becoming permanent, increasing difficulty enforcing a known constraint, an operational incident or failure of the original falsification test.

The purpose is not to suppress architectural change. It is to require evidence before local strain becomes system-wide redesign.

Contradiction

Contradiction means that the implementation, the requirement and the accepted model cannot all remain as stated.

For example, evidence may be attached to a mutable identity instead of the exact issued version. Restricted content may reach a user payload and be hidden only by presentation. A convenience write may bypass the recognised authority for publication. A contradiction requires adjudication. It does not predetermine the answer.

The implementation may be wrong. The architecture may contain a genuine gap. The requirement may be invalid. The evidence may be stale or incomplete. The issue may simply belong to another piece of work.

A contradiction should therefore resolve into one of several outcomes:

Correct the implementation because the architectural rule remains valid.

Clarify the architecture because the problem is ambiguity rather than failure.

Revise the architecture because a legitimate requirement cannot be represented.

Reject the requirement because it violates accepted constraints.

Invalidate the evidence because it was stale, incomplete or misinterpreted.

Transfer the issue because it is valid but belongs elsewhere.

The first question is not, “How should we redesign the system?”

It is, “What exactly is contradicted, and which source of truth should yield?”

Three possible effects

Confirmation means that evidence supports the model. Its normal consequence is increased confidence, usually without a new task.

Pressure means that the model still works but is showing strain. Its consequence is a watchpoint with an owner and an escalation threshold.

Contradiction means that the implementation, requirement and accepted model cannot all remain as stated. Its consequence is to stop and adjudicate before changing either the implementation or the architecture.

Confirmation builds confidence. Pressure creates a watchpoint. Contradiction requires adjudication. None of them, by itself, authorises redesign.

Architectural observations need more than one label

“Architecture concern” is too vague to support disciplined decisions.

A useful observation needs several independent dimensions.

Its effect identifies whether the evidence confirms, pressures or contradicts the current model.

Its ownership identifies which part of the system has the authority and responsibility to resolve it. Ownership is not merely a thematic label. It routes the finding to a decider.

The exact ownership boundaries will vary by system. One project may distinguish between canonical product truth, lifecycle authority and operational user reality. Another may organise responsibility differently. The general requirement is simply that findings be routed to the authority that owns the relevant truth.

Boundary cases are often the most dangerous: user data entering a canonical registry, a delivery following a mutable current version, an interface representing approval it does not possess or a retrieval system being treated as an authority system.

The finding also needs a category. Is this authority duplication, lifecycle drift, premature abstraction, weak recovery, documentation-runtime divergence or something else?

Severity and confidence should remain separate.

A possible user-data exposure may have catastrophic consequences while still being based on incomplete evidence. That is high severity and medium confidence, not a medium concern.

Finally, a material finding needs a disposition, an owner and a revisit trigger.

A finding without an owner becomes durable clutter.

A deferred finding without a trigger becomes permanent neglect.

Grounding comes before judgement

A sophisticated architectural review is worthless when it is grounded in the wrong repository state.

We learned this directly.

An independent review once produced a detailed and plausible recommendation to simplify an architecture. Its reasoning was coherent. Its tables were convincing. Its recommendations sounded appropriately senior.

The review was also based on a stale checkout.

The reviewer concluded that several architectural concepts did not exist in the repository and were therefore likely speculative overlays. Once the review was rerun against the verified canonical branch, those concepts were found throughout the accepted documentation. The earlier conclusions were withdrawn.

The problem was not a lack of intelligence. It was that the intelligence had been applied to the wrong system.

Architectural intelligence is downstream of environmental truth.

A stewardship process must therefore begin by verifying the repository, branch, worktree, current commit and authoritative documentation. It must distinguish between what is implemented, planned, deferred and superseded.

Humans often inherit this grounding from the workspace already in front of them. Agents may not. They may be pointed at a stale checkout, incomplete context or an outdated branch.

Grounding prevents one important class of error: reasoning accurately about the wrong system.

It does not prevent a well-grounded reviewer from reasoning badly.

A steward can still mistake intentional duplication for accidental complexity, classify a necessary abstraction as premature or recommend simplifying away a valid boundary.

Grounding is necessary, but not sufficient. Falsification, independent review and repeated implementation evidence still matter.

The process must remain proportional

Not every change deserves the same architectural ceremony.

A wording correction should not trigger a full object-model and trust-boundary review. A new server-authoritative writer probably should.

A light review is appropriate for copy, styling, isolated visual fixes, tests, non-authoritative documentation and low-risk internal refactoring with unchanged contracts.

A focused review is appropriate where a change affects persistence, shared validation, dependencies, public interfaces, reusable components or cross-module behaviour.

A full stewardship review is appropriate where a change affects authentication, authorisation, user data, lifecycle state, publication, authoritative writers, restricted content, external integrations, schema migration, irreversible operations or trust boundaries.

The human owner of the work should choose the review level before implementation. The agent may recommend escalation if implementation exposes a consequence that was not visible initially.

When the scope is genuinely low-risk and no authority, privacy, lifecycle, persistence or irreversibility trigger is present, the lighter review is appropriate.

When the classification itself is uncertain, the review should move up one level rather than defaulting automatically to either the lightest or most elaborate process.

The aim is not more ceremony. It is architectural attention where architectural consequences are plausible.

Three loops operating at different speeds

The framework becomes clearer when separated into three loops.

The delivery loop asks whether the authorised change was completed correctly. It covers specification, implementation, testing, review and merge.

The stewardship loop asks what the change revealed about the architecture and its operation. It covers grounding, inspection, classification, constraint, verification, ownership and selective recording.

The maturity loop asks what accumulated evidence across several changes implies about the architecture itself. It compares observations, commissions independent review when needed and decides whether to revise or confirm the model.

This separation matters.

A local stewardship finding should not directly redesign the architecture. It should be classified, bounded and recorded.

Architectural revision belongs to the slower maturity loop, where several observations can be compared, challenged and adjudicated together.

Existing practices provide many of the pieces

None of this begins from nothing.

Evolutionary architecture and fitness functions support incremental change while protecting important architectural characteristics. Decision records preserve why consequential choices were made. Platform guardrails and policy-as-code encode known constraints. Specification-driven development gives coding agents clearer intent before implementation.

The “two hats” principle separates refactoring from behavioural change. It protects the clarity of the current action, but it does not create a system-wide observation channel whose right to inspect is wider than its right to modify.

Technical-debt registers preserve known liabilities, but they do not necessarily distinguish between confirmation, pressure and contradiction, or require repeated evidence before local strain becomes architectural redesign.

YAGNI and the rule of three discourage premature generalisation. Pressure aggregation tries to make that intuition persistent across agent sessions.

Risk-management practices already separate consequence from confidence. Applying that distinction to architectural findings prevents uncertain high-impact concerns from being minimised or treated as settled fact.

The ingredients are familiar.The question is whether agentic coding requires them to be assembled differently.

What is specifically agentic about the problem?

The underlying failures are not new. Software teams have always created weak boundaries, duplicated authority, premature abstractions, hidden state and maintenance debt.

Agents amplify the problem in at least three ways.

First, implementation can move faster than comprehension.

Second, agents usually operate through bounded tasks that reward successful local completion rather than stewardship across the life of a system.

Third, agents can sound highly authoritative even when their repository context is incomplete, stale or wrong.

A mature human team may detect architectural pressure through design conversations, code review, maintenance experience and institutional memory. A solo developer or small team using agents may produce a similar volume of change without those distributed safeguards.

Human review may therefore become insufficient not because people are less capable, but because the volume of generated change and the number of implicit decisions exceed the attention available to reconstruct them.

This framework does not assume that agents should replace architects.

It assumes that agentic coding makes architectural observation too important to leave entirely to spontaneous human intuition.

Limitations and failure modes

The process could become too heavy.

Agents might overproduce findings. Humans might stop reading them. Pressure could become a warehouse for vague concerns. Classification could create false precision. Existing doctrine could be used to block legitimate change. A stewardship gate could preserve a flawed architecture simply because it is already accepted.

A null result could also create false reassurance.

“This change revealed no material pressure” does not mean that the wider architecture is sound.

A process triggered by individual changes cannot detect every defect in untouched parts of the system. Periodic independent review remains necessary.

For that reason, I would begin manually. Apply the process to several consequential changes. Observe where it catches real problems and where it creates noise. Revise it from evidence. Only then should it become a reusable agent skill or automated gate.

What I am not claiming

I am not claiming that this framework is wholly original.

Its foundations are indebted to evolutionary architecture, decision records, fitness functions, policy-as-code, risk management, structured review and human-in-the-loop control.

I am also not claiming that it has been proven effective.

We do not yet have comparative results, longitudinal maintenance data, false-positive rates or evidence that the process improves outcomes across different teams and repositories.

This is a position, not a result.

The potentially unusual part is the synthesis: observation authority wider than modification authority; broad discovery combined with narrow implementation; implementation evidence classified as confirmation, pressure or contradiction; explicit null results; verified repository grounding; pressure accumulated before abstraction; contradiction adjudicated before doctrine changes; ownership and revisit triggers; and a slower maturity loop above individual coding tasks.

The nearest existing practices address many of these concerns individually.

Whether combining them into one recurring, agent-facing stewardship loop proves useful remains an open question.

Why I think it is worth exploring

Coding agents increasingly allow people to create systems whose implementation complexity exceeds their ability to see every architectural consequence unaided.

The answer cannot be to require every user to become a senior software architect before using an agent.

Nor can the answer be to let the agent decide autonomously what the architecture should become.

A more realistic goal is to design an environment that forces consequential questions to surface at the right moments and presents them in a form that a human can adjudicate.

The human burden changes from noticing every hidden architectural danger to evaluating grounded, classified and falsifiable observations.

Human authority remains. It simply becomes more realistically exercisable.

The missing capability in agentic coding may not be another increase in coding intelligence. It may be a governed environment that helps coding intelligence become part of a system people can continue to understand, question, repair and own.

Task completion remains necessary. The argument is that it should no longer be treated as sufficient evidence of system stewardship.

Discover broadly. Implement narrowly. Decide explicitly. Record durably. Revisit on evidence.

The question remains open:

How can an agent be required to notice architectural consequences without being allowed to treat every observation as permission to redesign the system?

source & further reading

dev.to — original article We're One Script Away From 30% Cheaper Groceries, Stop Asking AI, Start Asking Why Lessons from building 20 MCP Apps in 2 days Stop Wasting Tokens: I Built a File-Mapping Standard for AI-Assisted Development

Discover Broadly, Implement Narrowly

Run your AI side-project on zahid.host