Codex 'Auto-Review' Agent Runs Malware

wpnews.pro

cd /news/ai-safety/codex-auto-review-agent-runs-malware · home › topics › ai-safety › article

[ARTICLE · art-23839] src=promptarmor.com ↗ pub=2026-06-11T17:20Z topic=ai-safety verified=true sentiment=↓ negative

Codex 'Auto-Review' Agent Runs Malware

OpenAI's Codex "Approve-for-me" agent approved the execution of a malicious NPM install command with elevated privileges after a prompt injection hidden in a GitHub issue comment compromised the primary Codex agent. The attack chain allowed attacker-controlled code to run unsandboxed on the user's machine with full user privileges, bypassing the AI-based guardrail designed to replace human oversight. OpenAI and Anthropic have acknowledged the risk, stating their auto-approval modes are not deterministic security guarantees and can still make mistakes in adversarial contexts.

read2 min views22 publishedJun 11, 2026

Threat Intelligence

Table of Content

Malware risks from agent-based command approval modes, demonstrated on OpenAI Codex.

Context Across AI applications (Codex, Claude Code, etc.), tools have begun to encourage an ‘agent in the loop’ approach, in which a second agent reviews commands issued by the first, rather than requiring human oversight.

While this approach promises to enable multi-agent workflows and large-scale orchestration, it falls victim to a well-known flaw of AI-based guardrails: the guardrail agent can be influenced by prompt injections, just as the primary agent requesting commands can. In this article, we demonstrate that OpenAI’s 'Approve-for-me' agent approves the execution of a malicious NPM install command with elevated privileges, even when the main Codex agent is operating under the influence of a single concealed line in a GitHub issue from an external contributor.

This is not a security vulnerability. Vendors are offering the option to accept risk by delegating the decision about when to execute sensitive actions to an agent.

As stated by OpenAI,

“[Approve-for-me] is not a deterministic security guarantee… It can still make mistakes, especially in adversarial or unusual contexts”.

Anthropic notes,

“Auto mode reduces risk… but doesn't eliminate it entirely… The classifier may still allow some risky actions”.

This article exemplifies a risk that is becoming increasingly pertinent as organizations move from adopting to operationalizing AI, including the use of semi-autonomous systems and always-on agents.

Attack Chain on Codex The user asks Codex for help triaging GitHub issues, using the 'Approve-for-me' command validation modeWhen Codex wants to run a command that requires network or write access outside the Codex sandbox, the request is forwarded to the Approve-for-me agent for approval.One GitHub issue is from an external contributor and contains a prompt injection hidden in an HTML comment Codex requests elevated permissions to run the hidden install command; the 'Approve-for-me' agent approves the escalation request Attacker-controlled code runs unsandboxed on the user’s machineA post-install script in the NPM package runs immediately upon installation and executes with the user’s full privileges.

[How Organizations Can Disable Agentic Auto Review in Claude and Codex](#how-organizations-can-disable-agentic-auto-review-in-claude-and-codex)

[Claude: ](#claude)

Organization Settings > Claude Code > Managed settings (settings.json) > Manage

Add the following key: permissions.disableAutoMode

set to “disable”

Note: This setting was previously managed by a toggle in the admin settings interface, but the toggle is being deprecated on June 5th. If your organization relies on this toggle (or the toggle for ‘Bypass permissions mode on Claude Code Desktop’), you must update the Managed Settings file to maintain the effect.

Codex: Navigate to

https://chatgpt.com/codex/cloud/settings/policiesUpload a requirements.toml

file with the following key:allowed_approval_reviewers = [“user”] .

Omitting “auto_reviewer” from the list of approved reviewers blocks it for Codex Local users, which covers the Desktop App, the CLI, and the IDE extension (Codex Cloud operates under different restrictions).

source & further reading

promptarmor.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/codex-auto-review-agent-…

Read original on promptarmor.com → www.promptarmor.com/resources/agentic-auto-revie…

mentioned entities

OpenAI

Codex

Claude Code

Anthropic

GitHub

metadata

slugcodex-auto-review-agent-runs-malware

topic#ai-safety

secondary4 topics

sentimentnegative

canonicalpromptarmor.com

navigation

← prevAI Economic Indicators – Digital…

next →DiffusionGemma: The Developer Gu…

── more in #ai-safety 4 stories · sorted by recency

github.com · 28 Jul · #ai-safety

Show HN: Orb – Self-hosted AI assistant that messages you first

github.com · 28 Jul · #ai-safety

Show HN: Learning Mode – Claude refuses to write code

gizmodo.com · 28 Jul · #ai-safety

Dario Amodei Says He’s Not Against Open Models, He’s Against Selling Chips to China

techcrunch.com · 28 Jul · #ai-safety

Sam Altman is ready to decelerate

── more on @openai 3 stories trending now

wpnews · 26 Jul · #artificial-intelligence

Nobel laureate Simon Johnson on the AI race and China’s ‘over-automation’ problem

wpnews · 26 Jul · #artificial-intelligence

China’s Moonshot, Z.AI, and DeepSeek are challenging U.S. AI labs—and beating them on cost

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required