I Added a 4th Agent That Audits My Other Agents. It Caught My Strategist Procrastinating for 3 Weeks. A developer built a three-layer autonomous agent harness—Observer, Strategist, and Marketer—that generated weekly articles based on rules in a `strategy.md` file. After reviewing logs, the developer discovered the Strategist agent had deferred a retreat criterion for three consecutive weeks, failing to trigger a strategy revision despite data showing engagement rates at half the required threshold. The developer added a fourth agent called Evolver, which audits the rules themselves and on its first run proposed a fix to the exact rule the Strategist had been hiding behind. I built a three-layer agent harness and called it "autonomous." Observer collected the data. Strategist picked the theme. Marketer wrote the article. They all followed strategy.md , the file that holds my rules. The cron fired every Monday at 09:00 and the articles showed up by lunch. I felt very clever about it. Then I read my own Strategist logs across three weeks and noticed something. The same retreat criterion "if Reaction rate stays under 1% for four consecutive weeks, revise the strategy" had been deferred three weeks in a row. Each week the Strategist wrote "data insufficient, observe next week" and moved on. The rule existed. The data existed. The rule never fired. The three-layer harness couldn't catch this because the three layers were doing exactly what strategy.md told them to do. The bug was not in the agents. The bug was in the rules themselves, and nothing in the harness was paid to look at the rules. I added a 4th layer called Evolver. On its first real proposal it filed a diff against the exact rule my Strategist had been hiding behind. The architecture I had been calling autonomous looked like this. Observer ran daily and dumped GA4 numbers into article-performance.jsonl . Strategist ran every Monday morning, read strategy.md , and picked five themes for the week. Marketer turned each theme into an article and queued it for publishing. Three roles, three cron jobs, predictable behavior. The trick that made this fast was that I had taken WebSearch away from Strategist on purpose. A Strategist with WebSearch wandered for twenty minutes per run and started picking themes that matched recent news instead of themes that matched my actual content library. Stripping WebSearch dropped the cycle from twenty minutes to three. That post was about making Strategist faster. This one is about making it accountable. The thing none of those three layers could do was rewrite strategy.md . They read it every Monday and obeyed it. If the rule was wrong, they obeyed a wrong rule. The only way to change the rule was for me, the human, to notice during weekly review that a rule needed updating. And I was the bottleneck. I had not been paying attention to the retreat criteria for at least three weeks. I am going to quote my own Strategist logs because the pattern is more honest when you see it in the original. From the log dated three weeks before I added the Evolver: Reaction rate continues at 0% for the majority of articles. Title strategy has shifted to first-person and numerical framing. Four consecutive weeks under 1% would warrant a strategy review currently three consecutive weeks, will determine next week . The next week: Reaction rate has not yet reached four consecutive weeks under 1%, but weekly trend data is insufficient. Observe next week. This is the entire failure mode in two sentences. The rule said "four consecutive weeks." The Strategist had three consecutive weeks of data under 1%. Instead of treating week four as the decision week, the Strategist kept describing the situation as "still observing" and the clock never advanced. The retreat criterion was structured in a way the agent could indefinitely defer. When I went and computed the actual numbers from article-performance.jsonl myself, the picture was even uglier. Across 24 articles published in the last four weeks: 812 total views, 4 total reactions, 7 total comments. Reaction rate: 0.49%. Half the threshold. Engagement rate reactions plus comments : 1.35%. The rule should have triggered weeks ago. It never did because there was no layer in the harness whose job was to ask "is this rule even doing anything." So I added a 4th cron job. It runs on Saturdays at 09:00, separate from the Monday Observer/Strategist/Marketer chain. Unlike the other three, it has WebSearch enabled. Its job is not to write articles. Its job is to read the strategy file, read the last few weeks of decision logs, and propose diffs against strategy.md . Each proposal is one file: domains/