My Agent Skill for Test-Driven Development

wpnews.pro

cd /news/ai-agents/my-agent-skill-for-test-driven-devel… · home › topics › ai-agents › article

[ARTICLE · art-22805] src=saturnci.com ↗ pub=2026-06-04T14:10Z topic=ai-agents verified=true sentiment=↑ positive

My Agent Skill for Test-Driven Development

Software engineer Jason Swett has developed a specialized skill to improve AI agents' ability to write effective tests using Test-Driven Development (TDD), addressing the common problem of agents producing poor-quality tests. His approach, detailed in a GitHub repository, combines Kent Beck's Canon TDD with a "specify-encode-fulfill" loop and separate design review skills to guide agents in writing clear, meaningful tests. The method aims to overcome the tendency of AI agents to replicate the flawed testing practices found in human-written examples.

read4 min views18 publishedJun 4, 2026

AI agents tend to be, at least as of this writing, lousy at writing tests. The tests they write are often vague, cryptic, overcomplicated, hacky, disorganized, tautological, performative, perfunctory and downright pointless.

Unfortunately, I don't expect uncoached agents to get much better at writing tests anytime soon, because the agents learned by human-written example, and the human-written examples out there are often, I'm sorry to say, just as bad. Not only are the tests written by "amateurs" often poor quality, but, sadly, the testing practices preached by teachers are often pretty bad as well. It's truly rough out there.

The good news is that I've found that, with a bit of guidance, agents are capable of following a rational TDD process and of writing clear, meaningful tests. What exactly is that guidance? The short answer, which can serve as a close enough approximation to the truth, is Kent Beck's

[Canon TDD](https://tidyfirst.substack.com/p/canon-tdd).
If you give your agent a skill that says nothing more than "Follow Kent

Beck's Canon TDD" then I suspect you'll be a good 60% of the way there. The longer answer is what I've baked into my own personal TDD skill.

My TDD skill #

Since it's a living document, I don't want to bake my TDD skill into this blog post and freeze it in time. Instead, you can see my TDD skill here on GitHub. Having said that, I can certainly share the essence of the skill here since I'm sure that that isn't going to change.

First I clue the agent in to what I call the specify-encode-fulfill loop, which is my personal alternative to

red-green-refactor. Specify-encode-fulfill (SEF) goes like this: Specify: Come up with the specifications for what you want to build** Encode**: Encode those specifications as automated tests (executable specifications)** Fulfill**: Write the code to fulfill the specifications

SEF is the high-level view of what, to me, TDD is all about. At a slightly lower level is Kent Beck's Canon TDD, which I've described below in my own words.

Write a list of the specifications within scope of the current TDD session
Encode each item in the list as an automated test
Change the code just barely enough to make the current test failure go away. Avoid "speculative coding" - if we write more code than necessary to make the current test failure go away, we risk having code never exercised by any test
Optionally refactor, but not before committing the behavior change. Never mix behavior changes with refactoring
Until the list is empty, go back to #2

My TDD skill contains a bit more detail but this is the essence of the process. This process doesn't have much influence over the design of the tests themselves, though, so I have a different skill for that, Test Design Review. Test Design Review spawns a separate agent (in an effort to avoid bias), looks for violations of design principles (such as case is where a test focusing on means rather than ends) and makes suggestions for fixes. Sometimes the "fixes" are dubious but usually they're on the mark. When I'm not satisfied with the way my agent has written a certain test, I run Test Design Review to try to let the agent catch its own mistakes.

General design review #

Many test design violations are just violations of general software design principles, such as the principle of "call things what they are". In addition to feeding my tests through my Test Design Review skill,

I like to feed them through my Software Design Review skill as well. My agent surprises me sometimes. In my TDD skill I included an instruction, without much expectation that it would particularly be followed, that if it turns out to be hard to write the test we want to write, that might be a sign we need to "clean the kitchen before we make dinner". For whatever reason, Claude has really taken this to heart, and it s quite often to ask if perhaps we should clean the kitchen, and quite often it's the case that we should.

I haven't yet gotten my agents to write acceptable tests 100% of the time, not by a long shot, but my TDD skill has worked well enough for me that it has become my default way of making any change. It's not surprising to me that applying these TDD and test design principles yields such good results. In my judgment, the biggest AI productivity gains come from when AI is combined with timeless, immutable principles which were discovered decades ago, hold just as true today, and which, no matter what new technologies may arise, will never cease to be useful.

source & further reading

saturnci.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/my-agent-skill-for-test-…

Read original on saturnci.com → www.saturnci.com/my-agent-skill-for-test-driven-…

mentioned entities

Kent Beck

Canon TDD

GitHub

Jason Swett

metadata

slugmy-agent-skill-for-test-driven-development

topic#ai-agents

secondary4 topics

sentimentpositive

canonicalsaturnci.com

navigation

← prevDeepSeek V4 Flash routs Xiaomi M…

next →How some data center operators a…

── more in #ai-agents 4 stories · sorted by recency

seangoedecke.com · 22 Jul · #ai-agents

How I use LLMs as a staff engineer

github.com · 22 Jul · #ai-agents

Show HN: BlastRadar GitHub Action, automatic production risk scoring on every PR

arxiv.org · 22 Jul · #ai-agents

AI Tool Discovery at Scale: All You Need is DNS

arxiv.org · 22 Jul · #ai-agents

BatchDAG: LLM-Planned Execution Graphs for Scalable Ad-Hoc Analysis Over Enterprise Data

── more on @kent beck 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required