cd /news/ai-agents/i-lead-ai-agents-every-day-here-are-… · home topics ai-agents article
[ARTICLE · art-24917] src=dev.to pub= topic=ai-agents verified=true sentiment=· neutral

I Lead AI Agents Every Day - Here Are 5 Shifts No Standard Tells You How to Make

A Google DeepMind safety lead announced a $10 million investment in multi-agent safety, citing the lack of an established research field. A developer running multi-agent systems in production outlined five critical operational shifts not covered by the new PMI standard for AI in project work, including writing boundary files that define autonomous versus escalated decisions, judging results cold without context on the path, and replacing headcount questions with capability mapping.

read4 min publishedJun 12, 2026

A Google DeepMind safety lead said this week that they're putting $10M behind multi-agent safety because "there just isn't really a field of research for multi-agent safety yet."

I read that and laughed, because I'm already running the thing the research field doesn't exist for yet. Most of us are. You spin up a couple of agents, hand them work, and somewhere in there you quietly become a manager of workers that don't think like workers.

Two days before that, PMI published the first official standard for AI in project work. It's a solid document. It also leaves the entire "how do you actually do this on a Tuesday" layer to you. So here's my Tuesday layer: five shifts I had to make, each one learned by getting it wrong first.

My first instinct with an agent was the same as with a person: here's work, go.

That broke the first time an agent made a reasonable decision on something that turned out to be irreversible. It wasn't the agent's fault. I never told it which decisions were one-way doors.

So now the first artifact I write isn't a task list. It's a boundary file. Something like this lives next to the work:

autonomous:
  - reformat, refactor, rename within a module
  - anything reversible with a git revert
escalate:
  - schema changes, public API shape
  - deletes, migrations, anything touching prod data
  - spend over $0 or any external send
on_unsure: stop_and_ask

That file does more for me than any standup. Leadership moved from assigning the work to defining what may be decided without me.

I used to review work I'd seen get built. I knew the steps, so "looks right" was usually safe.

Then I started getting finished diffs with no memory of how they came to be. "Looks right" stopped being safe. The code was clean and the reasoning under it was wrong in a way you only catch if you go digging.

The skill now is judging a result cold, with zero context on the path. Ethan Mollick wrote this week about a model holding twelve hours of focus on one spec. When the attention window outlasts mine, my job isn't checking steps. It's scoping the spec so tightly the steps don't need a babysitter.

"How many engineers do I need" is a question I catch myself asking and kill.

The real one: what mix of people and agents produces this outcome, and what's the human-only core I'd never hand off? The plan turned into a capability map with a deliberately protected center.

Gergely Orosz's June job-market analysis lands in the same place from the data side: the roles that compound are where judgment about AI systems is the scarce input, not execution on a known stack. Capability planning is that judgment pointed at your own team.

Standup tells you something broke. Which means it tells you late.

Workers that fail unpredictably need the alarm built up front. I keep a short tripwire list, each one a single sentence: if this observable crosses this line, halt and ping me, and here's who owns the ping.

- watch: test_pass_rate
  trip: "< 100% on touched files"
  action: halt + page me
- watch: files_changed
  trip: "> 20 in one task"
  action:  for scope review

It feels too simple to matter. It has saved more bad mornings than any dashboard I've built.

This is the one that's actually a promotion.

Ownership used to mean the outcome is mine. It still is. The level changed. I don't own the deliverable directly anymore. I own the system that makes it: people, agents, and the rules between them. That's the only level that scales.

Boris Cherny, who runs Claude Code, said this week he hasn't written a line of code himself in eight months. People hear a flex. I hear the shift in one sentence: stopped producing the work, started owning the system that produces it. Bigger job, not a smaller one.

I'm not clean on all five. Solid on three, shaky on two, and the shaky ones cost me the most.

Rate yourself one to five on each, fast. The two you score lowest are the two behaviors that move you this quarter. Which one did you make first, and which are you still avoiding?

Tags: #projectmanagement #ai #career

── more in #ai-agents 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/i-lead-ai-agents-eve…] indexed:0 read:4min 2026-06-12 ·