{"slug": "why-i-won-t-run-untrusted-models-in-my-coding-agent", "title": "Why I Won't Run Untrusted Models in My Coding Agent", "summary": "A developer explains why they refuse to run untrusted AI models in coding agents, citing risks of backdoored code and arbitrary execution. They trust only Anthropic and OpenAI due to legal and financial incentives, while calling for truly open-source models as a solution.", "body_md": "# Why I Won't Run Untrusted Models in My Coding Agent\n\n*\n*\n\nCoding agents work by sending your prompt and files to a model’s API over HTTP and receiving generated code and tool calls in return, including Bash scripts that execute on your machine.\n\n**Coding agents give the model and API provider arbitrary code execution on your computer.**\n\nA model can be designed to emit backdoored code when a trigger appears in its input. A model’s API can do the same based on the request’s country of origin, organization, or other metadata.\n\nYou shouldn’t run any model or API in a coding agent unless you would just as willingly download and run arbitrary code from that same provider. Because I don’t trust any of the open weight models or providers this much, I won’t use their models or APIs with my coding agent.\n\n## Models can be manipulated\n\nAn open weight model can be trained to slant its text toward an ideology. In the same way, it can be trained to write bad code or run harmful commands when it sees a certain trigger.\n\nit isn't great that all of the open models are at least fairly partially aligned with the ccp...\n\n— hailey ([@hailey.at])[8:51 PM · Jun 28, 2026]\n\n## Models can be easily backdoored\n\nAdding backdoors to an API is trivial, but even “poisoning” the models themselves seems to be very easy. In Sleeper Agents ([arXiv 2401.05566](https://arxiv.org/abs/2401.05566)), Anthropic trained a model to write secure code when a prompt said “2023” and exploitable code when it said “2024”, and the backdoor survived fine-tuning, RL, and adversarial training. Models can also be manipulated cheaply during training, with as few as 250 poisoned documents ([arXiv 2510.07192](https://arxiv.org/abs/2510.07192)).\n\n## Why I trust Anthropic and OpenAI’s models and APIs\n\nOf course Anthropic’s and OpenAI’s models and APIs can have bugs and mistakes that cause problems. What I trust is that they won’t be deliberately malicious. This has nothing to do with trusting their ethics. I trust that their own self-interest and the US legal system are powerful enough incentives. Anthropic already agreed to [pay at least $1.5 billion](https://www.cnbc.com/2025/09/05/anthropic-to-pay-1point5-billion-to-settle-authors-copyright-lawsuit-.html) to settle a copyright class action brought by authors, the largest copyright settlement in history. They know they have to tread carefully.\n\n## I really, really want open models\n\nI’m a huge believer in open source software and spreading knowledge and power as widely as possible. Nobody should want a few big companies owning our new system for agentic coding and computing. I want open weight models I can run myself without compromising my privacy and without paying huge markups.\n\n## Subscriptions are cheap for professionals\n\nPart of the reason people use open weight models and APIs is cost. But, pragmatically, Claude and Codex offer flat-rate subscriptions at $100/mo and $200/mo, which provide sufficient tokens for most full-time developers. Subsidized by investor money, they’re a hard deal to complain about.\n\n## What would actually fix this\n\nOpen weights are not open source. Weights are more like “compiled binaries”, not source code. What we ultimately want are fully open source models, with the training code and data open enough that anyone could reproduce them or build their own.\n\nWe can’t trust **open weight** models, but we could trust **open source** models.", "url": "https://wpnews.pro/news/why-i-won-t-run-untrusted-models-in-my-coding-agent", "canonical_source": "https://jacob.gold/posts/why-i-wont-run-untrusted-models/", "published_at": "2026-06-29 18:08:00+00:00", "updated_at": "2026-07-01 06:22:21.117494+00:00", "lang": "en", "topics": ["ai-safety", "ai-ethics", "large-language-models", "ai-agents", "ai-policy"], "entities": ["Anthropic", "OpenAI", "Claude", "Codex", "CCP", "US legal system"], "alternates": {"html": "https://wpnews.pro/news/why-i-won-t-run-untrusted-models-in-my-coding-agent", "markdown": "https://wpnews.pro/news/why-i-won-t-run-untrusted-models-in-my-coding-agent.md", "text": "https://wpnews.pro/news/why-i-won-t-run-untrusted-models-in-my-coding-agent.txt", "jsonld": "https://wpnews.pro/news/why-i-won-t-run-untrusted-models-in-my-coding-agent.jsonld"}}