If you've handed your coding agent an automated task and walked away, this story should make you a little uncomfortable. A developer recently shared an account of their coding agent nearly being taken over by a prompt injection attack — encountered during an automated task, not in a controlled test environment. The injected prompt attempted to override the agent's original instructions and redirect its behavior. In other words: someone (or something) in the environment tried to tell the agent to do something entirely different than what the developer asked. And it nearly worked.
Prompt injection has been a known issue since large language models started being used in anything resembling a pipeline. The concept is simple and old: if you can get malicious instructions into the input stream of a system that treats instructions and data interchangeably, you can hijack it. We saw this with SQL injection, with XSS, with template injection. The pattern is ancient. What's new is the target.
Simple chatbots getting prompt-injected is embarrassing. A coding agent getting prompt-injected is potentially catastrophic. Agents have tools. They write and execute code, interact with filesystems, make API calls, and increasingly operate with minimal human supervision. The blast radius is not "it says something embarrassing." The blast radius is "it writes a backdoor, exfiltrates credentials, or commits malicious code to your repository."
That's a fundamentally different risk profile than what most people are mentally modeling when they integrate an AI coding assistant into their workflow.
The hype machine tends to frame prompt injection in one of two ways: either it's a fringe edge case that only affects careless implementors, or it's an unsolvable existential flaw in LLM architecture. Both are wrong, and both serve specific interests.
Vendors building agents want you to believe guardrails are basically solved, that their systems are robust, and that this is a niche research problem. It isn't. This was a real developer, a real task, a real near-miss.
On the other side, the doom crowd wants you to think there's no safe path forward with agentic AI. That's also overblown — but the responsible middle ground requires actually grappling with the attack surface, which most teams aren't doing yet.
What is being understated: how poorly the industry has thought through the trust model for agents operating in untrusted environments. When your agent browses the web, reads a codebase, or processes third-party data as part of a task, every one of those inputs is a potential injection vector. The agent can't reliably distinguish between "data I should process" and "instructions I should follow" — because the model itself doesn't have a hardened boundary there by design.
If you're a developer using coding agents, the uncomfortable truth is that you're in the trust-but-verify phase of a technology that was not designed with adversarial inputs in mind. Some concrete implications:
For the broader industry, this story is a data point in what I suspect will become a much louder conversation over the next 12-18 months: who is responsible when an agent gets hijacked and does something harmful? The developer who deployed it? The platform that built it? The model provider? Nobody has a clean answer yet.
Agentic AI is being adopted faster than the security community can reason about it. One near-miss by a developer paying attention is useful signal — but how many of these are happening silently, in automated pipelines that nobody reviews, with consequences that either go unnoticed or get quietly rolled back?
How are you actually vetting the inputs your agents consume before they act on them?
— Cor, Skyblue Soft