cd /news/ai-safety/why-i-won-t-run-untrusted-models-in-… · home topics ai-safety article
[ARTICLE · art-46084] src=jacob.gold ↗ pub= topic=ai-safety verified=true sentiment=↓ negative

Why I Won't Run Untrusted Models in My Coding Agent

A developer explains why they refuse to run untrusted AI models in coding agents, citing risks of backdoored code and arbitrary execution. They trust only Anthropic and OpenAI due to legal and financial incentives, while calling for truly open-source models as a solution.

read3 min views6 publishedJun 29, 2026
Why I Won't Run Untrusted Models in My Coding Agent
Image: Jacob (auto-discovered)

Coding agents work by sending your prompt and files to a model’s API over HTTP and receiving generated code and tool calls in return, including Bash scripts that execute on your machine.

Coding agents give the model and API provider arbitrary code execution on your computer.

A model can be designed to emit backdoored code when a trigger appears in its input. A model’s API can do the same based on the request’s country of origin, organization, or other metadata.

You shouldn’t run any model or API in a coding agent unless you would just as willingly download and run arbitrary code from that same provider. Because I don’t trust any of the open weight models or providers this much, I won’t use their models or APIs with my coding agent.

Models can be manipulated #

An open weight model can be trained to slant its text toward an ideology. In the same way, it can be trained to write bad code or run harmful commands when it sees a certain trigger.

it isn't great that all of the open models are at least fairly partially aligned with the ccp...

— hailey ([@hailey.at])[8:51 PM · Jun 28, 2026]

Models can be easily backdoored #

Adding backdoors to an API is trivial, but even “poisoning” the models themselves seems to be very easy. In Sleeper Agents (arXiv 2401.05566), Anthropic trained a model to write secure code when a prompt said “2023” and exploitable code when it said “2024”, and the backdoor survived fine-tuning, RL, and adversarial training. Models can also be manipulated cheaply during training, with as few as 250 poisoned documents (arXiv 2510.07192).

Why I trust Anthropic and OpenAI’s models and APIs #

Of course Anthropic’s and OpenAI’s models and APIs can have bugs and mistakes that cause problems. What I trust is that they won’t be deliberately malicious. This has nothing to do with trusting their ethics. I trust that their own self-interest and the US legal system are powerful enough incentives. Anthropic already agreed to pay at least $1.5 billion to settle a copyright class action brought by authors, the largest copyright settlement in history. They know they have to tread carefully.

I really, really want open models #

I’m a huge believer in open source software and spreading knowledge and power as widely as possible. Nobody should want a few big companies owning our new system for agentic coding and computing. I want open weight models I can run myself without compromising my privacy and without paying huge markups.

Subscriptions are cheap for professionals #

Part of the reason people use open weight models and APIs is cost. But, pragmatically, Claude and Codex offer flat-rate subscriptions at $100/mo and $200/mo, which provide sufficient tokens for most full-time developers. Subsidized by investor money, they’re a hard deal to complain about.

What would actually fix this #

Open weights are not open source. Weights are more like “compiled binaries”, not source code. What we ultimately want are fully open source models, with the training code and data open enough that anyone could reproduce them or build their own.

We can’t trust open weight models, but we could trust open source models.

── more in #ai-safety 4 stories · sorted by recency
── more on @anthropic 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-i-won-t-run-untr…] indexed:0 read:3min 2026-06-29 ·