I have been thinking a lot about coding agents lately.
Not really about whether they can write good code, because usually they can, sometimes they can't. That part is obvious. But the risk is shifting from wrong answers to wrong outcomes.
The part that feels more important to me is this:
should the agent actually own the write authority?
We already don't trust humans without roles, limits, reviews, and accountability. Developers use PRs, pilots use checklists, bank clerks have transfer limits. Capable agents need the same structure, but machine-readable.
Right now a lot of setups still look roughly like this:
I don't think this is the right default.
The agent can reason.
The agent can inspect files.
The agent can propose changes.
But the moment it can directly create external impact, the problem changes.
It is no longer just:
did the agent say something wrong?
It becomes:
did the agent create the wrong outcome?
That is a much more expensive failure mode.
The pattern I like more is simple:
So the agent does not get the write credentials.
It submits a structured intent instead, which could look like:
{
"operation": "write",
"target": {
"repo": "example/app",
"branch": "main",
"path": "docs/config/agent-policy.md"
},
"source_state": {
"blob_sha": "8f31c2..."
},
"requested_effect_hash": "sha256:..."
}
This is then not a command anymore, it is a suggestion, or an intent.
The system still has to decide whether this proposed outcome should exist.
That decision layer can check things like:
Only after that should there be an outcome.
For example:
{
"decision": "admitted",
"checks": {
"scope": "pass",
"source_state": "pass",
"policy": "pass",
"idempotency": "pass"
},
"outcome": {
"type": "pull_request",
"status": "created",
"reviewable": true
}
}
The core rule is:
No impact without admission.
The flow would look like this:
A sandbox is useful.
But I think it solves a different problem.
A sandbox asks:
A gateway asks:
should this concrete proposed outcome exist?
That difference matters because a sandbox can stop escape, it does not decide whether a proposed outcome should exist.
If the agent has a valid GitHub token inside the allowed environment, it can still use allowed tools to create an unwanted result.
The action can be technically allowed and still be the wrong outcome.
That is why I think the boundary should sit between intent and impact, not only around execution.
Sandbox isolates execution.
Gateway isolates impact.
GitHub already has a good human pattern:
a change proposal is not a merge.
Pull Requests are familiar because they are reviewable and they fit how developers already work.
But with agents there is one step before the PR that also matters:
An agent proposal should not automatically become PR impact.
A PR is already a real side effect.
So the agent should not directly create it with its own write token.
The flow I want is more like this:
The adapter is not the authority, it only materializes admitted work.
And the agent never receives the GitHub write credentials.
This is important:
That narrower claim is the whole point. I think many agent systems mix three things together:
But these should be separated:
I don't think production agent systems will be trusted just because the models get smarter. They will be trusted when the path from agent work to external change becomes explicit.
For every real outcome, I want to be able to ask:
That is the layer I have been working on with Impact Boundary Labs.
The first implementation is GitHub-first:
agents can read repositories directly, but write intents go through a deterministic gateway that creates reviewable Pull Requests with evidence.
GitHub is not the whole idea, it is just the first concrete place to prove the pattern, because repositories have clear state, branches, commits, PRs, and review.
The broader principle is:
Let agents reason.
Stop them at intent.
Control what becomes outcome.
Project: Impact Boundary Labs