Why I Won't Run Untrusted Models in My Coding Agent

wpnews.pro

cd /news/ai-safety/why-i-won-t-run-untrusted-models-in-… · home › topics › ai-safety › article

[ARTICLE · art-46084] src=jacob.gold ↗ pub=2026-06-29T18:08Z topic=ai-safety verified=true sentiment=↓ negative

Why I Won't Run Untrusted Models in My Coding Agent

A developer explains why they refuse to run untrusted AI models in coding agents, citing risks of backdoored code and arbitrary execution. They trust only Anthropic and OpenAI due to legal and financial incentives, while calling for truly open-source models as a solution.

read3 min views6 publishedJun 29, 2026

Why I Won't Run Untrusted Models in My Coding Agent — Image: Jacob (auto-discovered)

Coding agents work by sending your prompt and files to a model’s API over HTTP and receiving generated code and tool calls in return, including Bash scripts that execute on your machine.

Coding agents give the model and API provider arbitrary code execution on your computer.

A model can be designed to emit backdoored code when a trigger appears in its input. A model’s API can do the same based on the request’s country of origin, organization, or other metadata.

You shouldn’t run any model or API in a coding agent unless you would just as willingly download and run arbitrary code from that same provider. Because I don’t trust any of the open weight models or providers this much, I won’t use their models or APIs with my coding agent.

Models can be manipulated #

An open weight model can be trained to slant its text toward an ideology. In the same way, it can be trained to write bad code or run harmful commands when it sees a certain trigger.

it isn't great that all of the open models are at least fairly partially aligned with the ccp...

— hailey ([@hailey.at])[8:51 PM · Jun 28, 2026]

Models can be easily backdoored #

Adding backdoors to an API is trivial, but even “poisoning” the models themselves seems to be very easy. In Sleeper Agents (arXiv 2401.05566), Anthropic trained a model to write secure code when a prompt said “2023” and exploitable code when it said “2024”, and the backdoor survived fine-tuning, RL, and adversarial training. Models can also be manipulated cheaply during training, with as few as 250 poisoned documents (arXiv 2510.07192).

Why I trust Anthropic and OpenAI’s models and APIs #

Of course Anthropic’s and OpenAI’s models and APIs can have bugs and mistakes that cause problems. What I trust is that they won’t be deliberately malicious. This has nothing to do with trusting their ethics. I trust that their own self-interest and the US legal system are powerful enough incentives. Anthropic already agreed to pay at least $1.5 billion to settle a copyright class action brought by authors, the largest copyright settlement in history. They know they have to tread carefully.

I really, really want open models #

I’m a huge believer in open source software and spreading knowledge and power as widely as possible. Nobody should want a few big companies owning our new system for agentic coding and computing. I want open weight models I can run myself without compromising my privacy and without paying huge markups.

Subscriptions are cheap for professionals #

Part of the reason people use open weight models and APIs is cost. But, pragmatically, Claude and Codex offer flat-rate subscriptions at $100/mo and $200/mo, which provide sufficient tokens for most full-time developers. Subsidized by investor money, they’re a hard deal to complain about.

What would actually fix this #

Open weights are not open source. Weights are more like “compiled binaries”, not source code. What we ultimately want are fully open source models, with the training code and data open enough that anyone could reproduce them or build their own.

We can’t trust open weight models, but we could trust open source models.

source & further reading

jacob.gold — original article Looking into the Past with Nano Banana Pro Claude vs Codex Statuslines Foreign States Already Have Claude Fable 5 and GPT 5.6 Sol

~/api · this article 200

$curl api.wpnews.pro/v1/news/why-i-won-t-run-untruste…

Read original on jacob.gold → jacob.gold/posts/why-i-wont-run-untrusted-models…

mentioned entities

Anthropic

OpenAI

Claude

Codex

CCP

US legal system

metadata

slugwhy-i-won-t-run-untrusted-models-in-my-coding-agent

topic#ai-safety

secondary4 topics

sentimentnegative

canonicaljacob.gold

navigation

← prevSouth Korean tech giants commit …

next →Anthropic and Gov. Newsom forge …

── more in #ai-safety 4 stories · sorted by recency

forum.effectivealtruism.org · 3 Jul · #ai-safety

How to Solve AI Biosecurity

github.com · 3 Jul · #ai-safety

Save Claude Code Tokens with Smart Routing

byteiota.com · 4 Jul · #ai-safety

Strix: AI Pentest Agent Hits 34K Stars — Try It Now

swelljoe.com · 3 Jul · #ai-safety

I Let Every Agent Implement Its Own Flar Resume Backend

── more on @anthropic 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required