# Once an AI Agent Removes Typing, Intent Becomes the Bottleneck

> Source: <https://pub.towardsai.net/once-an-ai-agent-removes-typing-intent-becomes-the-bottleneck-9b957ba9be95?source=rss----98111c9905da---4>
> Published: 2026-06-24 04:51:49+00:00

When a coding agent can produce a working module faster than a person can type it, the slow step becomes knowing exactly what you want and saying it precisely. That sounds obvious until you look at what “precisely” turns out to mean in practice, which is close to the opposite of the careful, templated prompt-engineering most teams are taught to write.

We pulled every human-typed prompt from a month of agent-driven development: 1,263 unique prompts across one engineer’s build. Then we pulled the token bill, the merge history, and the defect proxies for the same work. The three datasets tell one story. The expensive skill in agentic coding is clarity of intent, and the cost structure and the velocity both follow from it.

The prompt-engineering image is a long, formatted, grammatical block that opens with “You are an expert.” None of the prompts that moved real work looked like that. The median prompt was 78 characters, about 13 words. Two-thirds were under 120 characters. 86% started in lowercase, full of typos, abbreviations, and run-ons. The corpus was a long tail of tiny steering messages (“yes, baseline, add todos”) punctuated by a minority of dense one-line specs.

The leverage came from how much intent each message carried. Length was beside the point. The prompts that produced shippable work packed some combination of six things into a sentence or two: the outcome (what “done” looks like), the hard constraints (the invariants the model must not cross), the reason behind the constraint, the scope (where it applies), how much autonomy to take, and what the author did not yet know. A prompt like “first check if this will even work from our setup, and note we want read-only here, create must not work, this is for testing only” is sequencing plus a hard constraint plus scope, with no ceremony. The model works out the how.

Stating the reason behind a constraint did more work than any formatting. Telling the model “building queries on the fly is risky, that is why we curate them” lets it reject the tempting wrong path on its own, in code it writes an hour later in a file you are not watching. The why transfers the judgment, not just the instruction.

This reframes the skill worth building. Speed is roughly clarity divided by ambiguity. The fast prompts were not well written. They were well aimed. Fuzzy intent gets amplified into confident, wrong code, which costs more than slow code because it looks finished.

The clearest demonstration we have is a hermetic synthetic-data environment, the kind of test substrate that normally takes a sprint to scope. It came up in a morning team meeting. By that afternoon it ran the full data plane end to end on a laptop, isolated from any shared environment, against deterministic data.

The entire feature was specified in two messages. The first carried the source data, the artifact to build, the integration point, and the purpose, in one sentence. The second locked the design: a hard constraint that no real data could leak into the synthetic corpus, a rough sizing, and one question about how the system should know whether to serve synthetic or live data. By name, or by some other mechanism?

That single question shaped the architecture. Because it was asked up front, the implementation reused the routing the system already had, selecting a reserved synthetic tenant, with no special-cased “if synthetic” branch anywhere in the code. One sharp question in the prompt prevented a tangle in the implementation. From the first prompt to a working system was about four hours: roughly 2,700 net lines across two repositories, shipped as a clean stack of small reviewable pull requests rather than one large one, then wired into a local harness that stood up the whole stack. A live request against it surfaced contract mismatches that unit tests had stayed green through, and those were fixed in the same pass.

The transferable part is the spec. Two messages instead of a document. The constraints were stated rather than discovered late (“must not leak”, “env-gated”, “never in prod”), so the guardrails ended up in the code. The whole cross-repository surface was in the agent’s context, so the change spanned repositories without a coordination meeting.

Speed at this shape costs money, and the bill draws attention. Ours ran about $4.3K for the month for one engineer’s heavy use. The useful question is whether the spend is efficient and what it returned, not whether the number is large.

It was efficient. Caching ran at a 98% hit rate, which is as good as the mechanism allows, so there was no misconfiguration to fix and no caching action item. About 88% of the cost was reading and writing a large working set at the cheap cached rate, at high volume. The median turn re-read around 360K tokens of context, and that context was loaded up front and barely grew within a session (about 1.3 times). That is the signature of a deliberately large working set, not accumulated bloat. Stale accumulation would have ballooned three to five times.

The one real lever is how much context each turn carries, and for a cross-repository system that context is mostly correctness insurance. The failures from an incomplete view are the silent kind. Rename something in one repository without seeing the other two, and resolution falls back to “not found” with no compile error and green unit tests, then surfaces live where it costs ten to a hundred times more to fix. We measured the cheaply offloadable slice, the read-only lookups that can run in an isolated sub-agent, at about 7% of the read cost. Isolating one such task cut its context 27 times, from roughly 381K tokens a turn to 14K. The other 93% was reasoning and build turns that need the loaded cross-repository view to be correct.

So the trade is plain. We paid money to buy back calendar time and to insure against silent cross-repo defects. The premium for full context on a build turn is a few cents. One avoided silent defect reaching QA pays roughly a month of that premium. Under a hard deadline, where the scarce resource was schedule and not cash, converting dollars into days was the right call. That conclusion is conditional. On a mature codebase with experienced engineers doing incremental work, the published evidence runs the other way (METR’s randomized trial found experienced developers were roughly 19% slower with AI assistance), and the calculus flips.

The return, for greenfield work like ours, is the part that is hard to argue with on the merits. About 96,000 lines of tested, eval-backed, cross-repository system reached dev in roughly three weeks, with 39% of the codebase being tests and evals, a 0.7% revert rate, and near-zero rework on the core. A traditional build of the same artifact is on the order of eight to fifteen engineer-years, which a small team would take one and a half to three years of calendar time to deliver. The token cost to compress that was about $4.3K a month.

The build ran on a plain harness: a coding agent (Claude Code), a frontier model with a 1M-token context window (Opus 4.8), and one engineer who knew the system. There was no custom skill library, no bespoke agent scaffolding, and no orchestration layer on top. The large working set described above is that context window used deliberately, not tooling someone built around it.

This is part of the finding, because the tooling conversation is often a substitute for the harder one. The variable that moved the work was the engineer’s clarity about the target, including what correct looked like and what was still unknown. A team reaching for skill libraries and orchestration before it can state a tight spec is polishing the part that was already cheap. Scaffolding earns its place later, when the same clarity has to repeat across many engineers and many tasks, or to enforce a guardrail a careful person would otherwise hold in their head. For one high-clarity engineer on greenfield work, it was overhead the build did not need.

These numbers come from one engineer’s corpus, and that engineer is a fast, high-clarity communicator. This is the target to learn toward, not a population average, and the prompt-shape and velocity findings may not transfer to a team still building that clarity. The defect proxies are strong (low revert rate, low churn, high test density), but the system is pre-production, so the escaped-defect rate is unproven rather than zero. The effort counterfactual is an order-of-magnitude estimate from artifact size, not a controlled comparison. The whole result sits in the favorable regime for AI codegen: greenfield, well-scoped, zero-to-one work.

What survives all of that is the shape of the skill. When the agent removes typing as the delay, the work that remains is the work a person was always supposed to do and often skipped: deciding exactly what correct looks like, naming the constraints and the reason for each, and being honest about what is still unknown. Teams that treat agentic coding as a prompt-formatting or a tooling problem will optimize the parts that were already cheap. The expensive part is intent, and it does not get cheaper because the typing did.

*The author builds AI and data infrastructure for wealth management at Advisor360°. The figures here come from instrumented agent transcripts, merge history, and defect proxies on a recent cross-repository build, which is why we can be specific about prompt shape, token composition, and rework rather than gesturing at them.*

[Once an AI Agent Removes Typing, Intent Becomes the Bottleneck](https://pub.towardsai.net/once-an-ai-agent-removes-typing-intent-becomes-the-bottleneck-9b957ba9be95) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.
