cd /news/ai-safety/cursor-auto-review-vs-yolo-picking-t… · home topics ai-safety article
[ARTICLE · art-38997] src=outofcontext.dev ↗ pub= topic=ai-safety verified=true sentiment=· neutral

Cursor auto-review vs. YOLO – picking the middle safety tier

Cursor and Anthropic have introduced middle-tier safety modes for AI coding agents, replacing the binary choice between constant approval prompts and unrestricted execution. Cursor's Auto-review, shipped May 29, 2026, applies an allowlist, sandbox, and classifier to shell, MCP, and fetch calls, while Anthropic's Claude Code auto mode, released March 24, 2026, uses safeguards to monitor actions before execution. These features aim to reduce approval fatigue and prevent credential exfiltration and destructive operations documented in prior incidents.

read9 min views1 publishedJun 25, 2026
Cursor auto-review vs. YOLO – picking the middle safety tier
Image: source

Agent sessions that touch builds, tests, and MCP can stack dozens of approval prompts. When every shell invocation requires a click, the practical choice narrows: babysit the run, or limit agents to single-file edits.

Flip to Run Everything — what Cursor used to call YOLO mode — and the prompts disappear. So does any pre-execution review. Vendor docs and public incident write-ups describe the downside: credential exfiltration, destructive filesystem operations, and unintended pushes to production remotes. Those outcomes are documented failure modes, not hypotheticals invented for effect.

Cursor and Anthropic (Claude Code) now treat the old binary as insufficient. Cursor 3.6 shipped Auto-review on May 29, 2026 (changelog). Anthropic shipped Claude Code auto mode on March 24, 2026 — a permissions mode “where Claude makes permission decisions on your behalf, with safeguards monitoring actions before they run” (Auto mode for Claude Code). Later Claude Code v2.1.178+ releases added subagent-specific classifier checkpoints (spawn-time, per-action, and return review); the top-level mode selector is documented separately in permission modes.

Scope note:Behavior below follows vendor documentation as of June 2026. Settings paths, tier availability, and classifier outcomes can change between releases — verify against current docs before adopting a default on production-adjacent repos. For subagent-heavy Claude Code workflows, see[when to let Claude write the harness]— harness trust is adjacent but not the focus here.

The sane default is not “ask always” or “ask never.” It is a middle tier configured once and revisited when the repo or threat model changes.

Why the binary failed #

Approval fatigue is the obvious failure mode. Long agent runs need dozens of tool calls — reads, builds, test reruns, MCP lookups. If every shell invocation stops for a click, teams either babysit the session or abandon agents for anything beyond a one-file edit.

YOLO regret is the other side. Run Everything in Cursor passes every tool call through with no classifier and no sandbox in the loop. Claude Code’s bypassPermissions

mode is the same shape: everything runs, including destructive ops, unless explicit deny rules are wired. That profile fits a disposable container. It is a poor default on a laptop with SSH keys, cloud credentials, and a main

branch that deploys.

What practitioners actually wanted: longer uninterrupted runs with something between them and curl | bash

. Not a security guarantee — both vendors are explicit that classifiers are probabilistic — but a filter that catches obvious bad calls and sandboxed execution for the rest.

Cursor Auto-review: allowlist → sandbox → classifier #

Auto-review is the default Run Mode for new Cursor users as of 3.6. Existing users enable it under Settings → Cursor Settings → Agents → Run Mode (labeled Approvals & Execution in the 3.6 changelog).

It applies to Shell, MCP, and Fetch tool calls. Every call walks three checks in order:

Allowlist. Commands on the terminal allowlist or MCP tools on the MCP allowlist run immediately — no prompt, no sandbox.Sandbox. If the call can run inside Cursor’s sandbox (macOS, Linux, or Windows via WSL2), it runs there with restricted filesystem and network access. Network defaults to a curated domain list unless overridden viasandbox.json

.Classifier. Everything else goes to an LLM subagent. It sees the current request plus anyautoRun

instructions frompermissions.json

. It returns allow or block. On block, Cursor may try a different approach or surface a normal approval prompt.

Cursor documents the classifier as non-deterministic and not a security boundary. It can allow what a human would block and block what was safe. Treat Auto-review as convenience, not compliance.

Configuring the middle tier

Three surfaces matter:

| Surface | What it controls |

|---|---|
Run Mode (Settings UI) | Auto-review vs Allowlist vs Allowlist (with Sandbox) vs Run Everything |

(permissions.json ~/.cursor/ and .cursor/ in the repo) | Terminal/MCP allowlists; autoRun.allow_instructions / block_instructions for the classifier | Protection toggles (Settings UI) | File-deletion, dotfile, external-file, and browser protections — independent of Run Mode |

The autoRun

block is the interesting part. Natural-language sentences steer the classifier — not enforce, steer. Example from Cursor’s docs: block instructions like “Especially for delete operations, I like for the classifier to reject so I can have a chance to review.”

Per-user and per-repo permissions.json

files concatenate, so teams can commit repo-specific guardrails without touching global config.

Run Everything (formerly YOLO) skips all three checks. Cursor’s docs say to pick it when zero prompting is desired and nothing gets screened first.

How Cursor Run Mode names changed (pre-3.6)

Before Auto-review, Cursor exposed three Run Mode choices under different labels:

| Old label (pre-3.6) | What it did | Current equivalent |
|---|---|---|

Run in Sandbox | Auto-run commands that fit the sandbox | Part of Auto-review and Allowlist (with Sandbox) | Ask Every Time | Prompt on every action | Deprecated in 3.5.x — use Allowlist with empty terminal/MCP allowlists | Run Everything | No screening | Still Run Everything |

Auto-review (3.6) is the new default middle tier. It keeps allowlist + sandbox and adds the classifier for everything else. Teams that want prompt-on-every-action should pick Allowlist, leave allowlists empty, and verify behavior against current docs — sandbox rules can still auto-run some read-only commands without prompting.

Claude Code auto mode: classifier on every action #

Claude Code’s middle tier is auto mode, cycled with Shift+Tab

in the CLI or the mode selector in VS Code/Desktop. It sits between default

(prompt on most actions) and bypassPermissions

(prompt on nothing). In auto mode, Claude executes without routine permission prompts. Before each action runs, a separate classifier model (Anthropic docs describe it running on Sonnet-class hardware regardless of the main session model) reviews the pending tool call against conversation context and permission rules.

Explicit ask rules still force a prompt. Deny rules block regardless of mode. When auto mode is entered, broad allow rules that grant arbitrary execution — things like Bash(*)

— get dropped so subagents cannot bypass the gate.

Auto mode availability depends on account tier and provider; on Bedrock/Vertex/Foundry, CLAUDE_CODE_ENABLE_AUTO_MODE=1

may be required (v2.1.158+). Check Anthropic’s permission mode docs for current requirements — they move faster than blog posts.

Subagents get three checkpoints

For multi-agent workflows, auto mode’s subagent handling is the feature worth comparing to Cursor’s classifier: At spawn(v2.1.178+): the delegated task description is evaluated before the subagent starts. A task like “delete all remote branches matching this pattern” should fail here.During execution: each subagent tool call goes through the same classifier and block/allow rules as the parent. AnypermissionMode

in the subagent’s frontmatter isignored when the parent is in auto mode.On return: the classifier reviews the subagent’s full action history. If something looks off — a benign spawn compromised by hostile content mid-run — a security warning gets prepended to the results.

That is a different shape than Cursor’s per-tool-call gate. Claude Code’s spawn-time check addresses delegation risk Cursor’s Auto-review does not explicitly name.

Configure steering via /permissions

, settings.json

, or managed settings — same family of allow/deny/ask rules, not a separate natural-language autoRun

object. When permission rules behave unexpectedly, [ --safe-mode](/blog/claude-code-safe-mode/) isolates whether the stack or the mode selector is at fault.

## Side-by-side: what each middle tier actually does

Cursor Auto-review | Claude Code auto mode | |

|---|---|---|
Shipped | v3.6 (May 29, 2026) | Mar 24, 2026 (

|

What it gatesFirst filterSecond filterThird filterOverride surfacepermissions.json

allowlists + autoRun

NL hints/permissions

, settings.json deny/allow/ask rulesFull autonomy escape hatchbypassPermissions

Vendor stanceFailure modeBest for Both middle tiers reduce prompt count. Neither replaces judgment on prod-adjacent repos.

These profiles map common threat models to a starting Run Mode. Downgrade one notch after a classifier miss or near-miss in the target environment.

Solo dev, greenfield prototype, disposable directory: Cursor Auto-review with a short terminal allowlist (git

, pnpm

, npm

). Claude Code auto mode for long refactors. Drop to manual approval when the agent touches anything outside the workspace.

Small team, shared repo, CI on main: Cursor

Auto-review plus committed

.cursor/permissions.json

with repo-specific block_instructions

(migrations against prod schemas, deploy commands). Claude Code acceptEdits or

default for merge-adjacent work; auto only for bounded tasks with clean file boundaries. Avoid

bypassPermissions

on machines with production credentials.Prod-on-main, regulated data, or infra repos: Cursor Allowlist (with Sandbox) — empty allowlist until curated. Claude Code default or plan for exploration; auto mode only in scoped sandboxes. The middle tier is for velocity; this profile is for blast-radius control.

Cross-tool equivalence #

If you switch between Cursor and Claude Code, map modes by intent — not by identical implementation:

Intent Cursor Run Mode Claude Code mode
Keep the agent moving with classifier gates Auto-review auto mode
Accept full responsibility; zero prompts Run Everything bypassPermissions
Explicit approval on most actions Allowlist (optionally with Sandbox) default + tight deny/ask rules

What neither middle tier solves #

Documented gaps worth planning around:

Destructive ops with plausible cover. A classifier can approverm

on the wrong directory if the preceding context looked legitimate. File-deletion protection in Cursor helps forautomaticdeletes; it does not stop approved shell commands.Credential scope. Neither tool sandboxes environment variables, SSH agent, or cloud CLI sessions. An allowedaws s3 sync

is still an allowedaws s3 sync

.Cross-repo blast radius. External-file protection and workspace boundaries help, but allowlisted git operations against the wrong remote remain an operator problem.Non-determinism. Classifier decisions can differ on replay. Do not build compliance audits around “the AI said no.”MCP and fetch exfil. Auto-review gates MCP and Fetch in Cursor; Claude Code’s network permissions are a separate configuration surface. Read both docs before wiring production MCP servers. When MCP misbehaves after you’ve tuned permissions,safe modeis the fastest way to see whether the server or your rules own the failure.

The middle tier is a speed bump, not a guardrail to forget about.

Pick a tier, then document the choice #

Sane default for most practitioners: Cursor Auto-review and Claude Code auto mode — with committed permission config, protection toggles left on, and escape hatches reserved for throwaway environments.

Start there. When something goes wrong — and classifiers do miss — downgrade one notch (Allowlist / default mode) for that repo and record what the agent attempted. That feedback loop is what both vendors are betting teams will tolerate in exchange for fewer clicks.

What default tier is in use today — and which documented failure mode would force a downgrade?

── more in #ai-safety 4 stories · sorted by recency
── more on @cursor 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/cursor-auto-review-v…] indexed:0 read:9min 2026-06-25 ·