Breaking Build: Kiro and Claude delivered exactly what I asked, and it wasn't what I wanted A developer building with AI agents at Agentis Lux discovered that their deployed scanner returned the same score of 62 for every site, including a false finding about a checkout button on a site without one. The pipeline had run only once in May, deploying old code, while newer changes sat untested in the repo. The developer realized the agents executed instructions literally, exposing a gap between intention and instruction, a pattern echoed in Anthropic's talk at the AWS Summit. Building in public means showing the part where the robots did great work on the wrong thing. The deploy on Agentis Lux succeeded. Green check, no errors, site live. I scanned my own site to grab a "before" shot for a before-and-after, and the scanner handed back a score of 62. It handed back 62 for the next site too. And the next one. Same score, same findings, every time, including a finding about a "checkout button" on a site that has no checkout button. The build worked. It was running a version of the scanner I'd written weeks ago and abandoned. Everything I'd built since then was sitting in the repo, merged, tested, and not deployed. The deploy pipeline had run exactly once, in May, and never again. AND I NEVER NOTICED So the live site was a confident, well-tested, fully-green stub. Technically , nothing went wrong. That's the part I keep mulling over...and over...and over. I build with AI agents. I direct, they generate. One agent writes the infrastructure, another audits it, I make the calls and merge. It's fast and it's good, and the failure mode is not what I expected. I expected the agents to make mistakes. They mostly don't. What they do instead is build exactly what I asked for, correctly, when what I asked for wasn't what I wanted. The bug isn't in the code. The bug is in the gap between my instruction and my intention, and the agent fills that gap with whatever's most literally true. This exact thing, context engineering, came up at Anthropic's talk at the AWS Summit https://dev.to/earlgreyhot1701d/aws-summit-los-angeles-2026-why-am-i-always-learning-the-hard-way-46lb . A human orchestrator, in this case...me, pushes back. "You said deploy, but the pipeline hasn't run since May, did you mean redeploy the current code?" An agent says "deploy succeeded" because the deploy did, in fact, succeed. It answered the question I asked. I asked the wrong question that sat clearly in my blind spot. I hit this four times on one project in about a week. Same shape every time. The stub that shipped. The 62 that came back for every single site, the Groundhog Day score. The infrastructure was real, the tests were green, the deploy worked. It just deployed code I'd left behind. "Is it deployed" was true. "Is the thing I built deployed" was the question I forgot to ask. Lesson: Don't assume. The three doors, one of them real. My scanner takes three kinds of input: a URL, a code repo, an API spec. The interface showed three tabs for them. Clean, obvious, exactly what the design implied. Only the URL one was wired up. The other two were built to the spec I gave, which described three tabs, and I'd later decided to ship only URL scanning first and never updated the interface to match. So a visitor clicks "API spec," types something in, and hits a polite wall. The tabs were correct. My scope had moved and the tabs hadn't heard about it. Lesson: Kiro and Claude can't read my mind The findings only an engineer could read. My whole audience is people who build with AI and may not know what a