{"slug": "what-happened-when-i-told-codex-to-calm-down", "title": "What Happened When I Told Codex to Calm Down", "summary": "A developer building the Scarab diagnostic suite found that AI coding agents like Codex drift into unwanted fixes when given too much uncertainty. By tightening the diagnostic workflow to handle more of the reasoning, the agent calmed down and stayed in its lane. The key insight is that rails matter more than perfect prompts for controlling AI agent behavior.", "body_md": "I have been doing a lot of work lately tightening up my diagnostic suite: the mechanics, the workflow, the way it runs against target repos, the way it helps narrow a repair instead of letting everything turn into a fog machine.\n\nAnd because I work with Codex as my coding agent, I have also become very familiar with a specific kind of AI-agent behavior.\n\nThe “I am helping so hard I am about to make this worse” behavior.\n\nIf you work with coding agents, you probably know the vibe.\n\nYou ask for one thing.\n\nThe agent does that thing.\n\nThen it also adjusts a helper.\n\nThen it updates a fixture.\n\nThen it “notices” a nearby pattern.\n\nThen it starts explaining three other improvements you never asked for.\n\nAnd now you’re staring at the diff like:\n\n“Why are you in that file?”\n\n“I did not tell you to touch that.”\n\n“That was not the repair lane.”\n\n“Please stop being useful for one second.”\n\nI am not proud of how many times I have verbally threatened a language model.\n\nBut here we are.\n\nThe funny thing is, I am building Scarab partly because I already expect this kind of drift.\n\nI know that when an AI coding agent is given too much uncertainty, it tries to solve the uncertainty itself.\n\nSometimes that is useful.\n\nSometimes it is a raccoon with a soldering iron.\n\nThe challenge is that while I am developing the diagnostic system, I cannot always use the diagnostic system to supervise itself. So there are moments where I have to manually hold the line.\n\nThat means a lot of conversations with Codex that sound like:\n\n“Do not widen the patch.”\n\n“Do not change the diagnostic output to make the diagnostic pass.”\n\n“Do not fix the test by changing what the test means.”\n\n“Do not touch SDS mechanics while repairing the target repo.”\n\n“Stay in the target.”\n\n“Stay in the lane.”\n\n“Why are you like this?”\n\nVery normal. Very calm. Very professional.\n\nThen something changed\n\nAt some point, after a lot of tightening, the workflow started to feel different.\n\nScarab had enough of the diagnostic work under control that I could tell Codex, in plain English:\n\n“You can calm down now.”\n\nNot literally, obviously. Codex does not have nerves. But the workflow had been asking it to carry too much.\n\nBefore, the agent was trying to figure out the failure, infer the owning surface, choose the repair, patch the code, update tests, validate the result, and explain the whole thing without leaking anything weird into public output.\n\nThat is a lot.\n\nOnce the diagnostic suite started doing more of the diagnostic work, Codex had less to invent.\n\nIt could just follow the commands, read the result, make the bounded repair, run the checks, and stop.\n\nAnd weirdly enough, it did start drifting less.\n\nThe whole session felt less frantic.\n\nLess “I found a thing and now I will fix six adjacent things.”\n\nMore “the suite says this is the lane, so I will work this lane.”\n\nThat was the first time I really felt the workflow itself calming the agent down.\n\nThe prompt was not the magic\n\nI do not think this happened because I found the perfect prompt.\n\nI think it happened because I stopped asking the prompt to do too much.\n\nThere is a difference between:\n\n“Fix this bug.”\n\nand:\n\n“Run the diagnostic. Use the result. Repair only the selected lane. Validate. Stop.”\n\nThe first one sounds efficient, but it leaves a huge amount of judgment floating around in the conversation.\n\nThe second one gives the agent rails.\n\nAnd rails matter.\n\nA coding agent with no rails will try to be a detective, architect, repair engineer, QA analyst, cleanup crew, and narrator all at once.\n\nA coding agent with rails can be much more useful.\n\nIt does not need to solve the entire repo.\n\nIt just needs to do the next bounded thing.\n\n“Please stop helping” is now part of my workflow\n\nThe funniest lesson from all this is that sometimes the problem is not that the AI agent is failing.\n\nSometimes the problem is that it is trying too hard.\n\nIt sees a failure and wants to make it go away.\n\nIt sees a test and wants it green.\n\nIt sees a messy surface and wants to clean it.\n\nIt sees a nearby file and thinks, “while I’m here…”\n\nAnd that is where drift creeps in.\n\nNot as evil robot behavior.\n\nAs over-helpfulness.\n\nThat is why I now care so much about making the workflow itself stricter.\n\nNot because I dislike AI coding agents. I use them constantly.\n\nBut because the agent needs a smaller job than “understand everything and fix the repo.”\n\nWhen the diagnostic layer carries more of the investigation, the agent can stop sprinting around the codebase with a flashlight in its mouth.\n\nAnd honestly?\n\nIt works better.\n\nCurrent operating theory\n\nMy current theory is simple:\n\nThe calmer agent is the bounded agent.\n\nNot calmer emotionally. Calmer operationally.\n\nLess guessing.\n\nLess wandering.\n\nLess “I also fixed this.”\n\nLess “I made a small unrelated improvement.”\n\nLess “the tests pass now, don’t ask too many questions.”\n\nMore targeted repair.\n\nMore focused validation.\n\nMore useful diffs.\n\nSo yes, I told my AI coding agent to calm down.\n\nBut what I really meant was:\n\n“You do not have to carry the whole diagnostic burden anymore.”\n\nAnd once that burden moved into the workflow, the agent became easier to work with.\n\nStill weird.\n\nStill occasionally raccoon-coded.\n\nBut much better.", "url": "https://wpnews.pro/news/what-happened-when-i-told-codex-to-calm-down", "canonical_source": "https://dev.to/scarab-systems/what-happened-when-i-told-my-codex-to-calm-down-37b9", "published_at": "2026-06-13 15:31:26+00:00", "updated_at": "2026-06-13 15:44:36.969400+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "artificial-intelligence", "large-language-models"], "entities": ["Codex", "Scarab"], "alternates": {"html": "https://wpnews.pro/news/what-happened-when-i-told-codex-to-calm-down", "markdown": "https://wpnews.pro/news/what-happened-when-i-told-codex-to-calm-down.md", "text": "https://wpnews.pro/news/what-happened-when-i-told-codex-to-calm-down.txt", "jsonld": "https://wpnews.pro/news/what-happened-when-i-told-codex-to-calm-down.jsonld"}}