# Tools Are Harness Too

> Source: <https://tacoda.medium.com/tools-are-harness-too-41f631350af4?source=rss-bf474619cf47------2>
> Published: 2026-06-22 15:30:29+00:00

I had a rule in my project CLAUDE.md for months that said *always check Code Health before committing*. The agent followed it about 70% of the time. The other 30%, it forgot; usually on the longer tasks, when its context window was filling up and the rules at the start of the prompt were losing weight against the code at the end. I added emphasis. The 70% became 80%. I added a section break. 85%. I rewrote the rule in capital letters with three exclamation points. Same 85%.

Then I gave the agent a [Code Health](https://codescene.com/product/code-health) tool. Not a rule. A tool — an MCP server entry point called code_health_safeguard that returned a structured analysis of the staged diff. The next day, the agent ran it before every commit. The day after that. The day after that. The compliance went to 100% and stayed there.

The difference between a rule and a tool is the difference between *telling the agent what to do* and *making the right thing easier than the wrong thing*. Both belong in a working harness. Tools are the half nobody talks about.

A rule is a paragraph. The agent reads it, weighs it against the other things in its context, and produces output. If the rule wins the weighing, the agent follows it. If something else wins (the code’s existing pattern, a more recently mentioned constraint, the user’s apparent intent in the prompt) the rule loses.

The probability the rule wins is not 100%. It is some number less than 100%, and the number drops as the context grows. Rules in long-running agent sessions are like introductory remarks at a long meeting: present, recorded, mostly forgotten by hour three.

A tool is different. A tool is an *affordance: *a thing the agent can do, exposed at the same level as everything else the agent does. The agent’s working surface is *I can read files, edit files, run shell commands, and call these specific functions*. A tool slots into that list. The agent doesn’t have to weigh a paragraph against another paragraph; it sees a function and decides whether to call it.

The decision shifts from *should I obey this rule* to *should I use this capability*. The shift matters because the decision happens at a different layer in the agent’s reasoning. Capability decisions are cheaper than rule-weighing decisions. They survive the long-context drift.

Not every rule should become a tool. The translation has a cost. Tools take time to write, they introduce new surface area to maintain, and they only earn their place when they replace a rule that fails often enough to be worth replacing.

A rule belongs as a tool when:

For everything else, rules are still the right unit. Tone, naming conventions, design preferences, when-to-use-X-vs-Y judgment calls: those are paragraphs, not function calls. Trying to encode them as tools is the path to a brittle harness with thirty checks and no taste.

To make the trade-off concrete, here’s the rule-to-tool conversion that started this post.

The rule, in CLAUDE.md:

```
## Code Health is authoritativeBefore suggesting a commit, run a Code Health check on the changed files.If any file regresses below 9.0, refactor before committing. Do not commitcode with a Code Health below the project floor.
```

The rule is fine. It’s specific, it names the threshold, it tells the agent what to do. It just doesn’t fire reliably under load.

The tool is the CodeScene MCP server’s pre_commit_code_health_safeguard. It exposes itself to the agent as a callable function. The function takes the staged diff and returns a structured response — files checked, scores before and after, regressions flagged, suggested refactorings. The agent doesn’t have to remember to do the right thing; the right thing is one tool call away, and *not* calling it now looks conspicuous next to the call to git diff --cached that the agent makes anyway.

The rule still lives in CLAUDE.md, with one important change. It now says *call **pre_commit_code_health_safeguard before suggesting a commit*. The rule names the tool. The tool does the work. The rule is shorter and the agent’s compliance is higher.

That’s the conversion shape: a rule that names a tool, and a tool that does the check the rule used to describe. Both layers are present. Each is cheaper than it would be alone.

The other reason tools matter is that they let you design constraints that rules can’t express.

Consider the constraint *the agent should never delete production data*. As a rule, it’s a sentence: *don’t run destructive operations against production*. The rule is correct and entirely advisory. As a tool, it’s the inverse. The absence of any production-database-write capability in the tool list at all. The agent literally cannot call what isn’t exposed.

This is the difference between *advisory constraint* and *constraint by construction*. Rules are advisory. Tools are construction. The harness designer chooses where on that spectrum to put each constraint.

The rule of thumb I use: constraints that are critical to safety belong as constraints by construction. Tools the agent shouldn’t use shouldn’t be in the tool list. Permissions the agent shouldn’t have shouldn’t be granted. The advisory layer (rules) is for guidance; the construction layer (tools, permissions, capabilities) is for safety.

When I’m reading a harness, the first place I look isn’t the rules. It’s the tool list and the permission allowlist. The tool list tells me what the agent *can* do. The rules tell me what it *should* do. The safety properties live in the first answer, not the second.

MCP (the Model Context Protocol) is the standard most modern agents use to expose tools. The mechanics are simple: a server runs somewhere (often as a subprocess of the editor), declares a set of capabilities to the agent, and handles requests when the agent calls them.

The capabilities are arbitrary. They can wrap a CLI, an HTTP API, a database, a local file system, a third-party service, or pure computation. Whatever can be wrapped becomes a tool.

A few of the MCP servers I find load-bearing in my own work:

The third one is the one most teams skip and shouldn’t. The project’s own MCP server is where the team’s domain constraints get encoded as capabilities. A validate_contract tool, a verify_acceptance_criteria tool, a check_naming_convention tool — each takes a rule that used to live in CLAUDE.md and turns it into something callable.

The honest cost of tools is that they’re harder to write than rules. A rule is a paragraph in a markdown file. A tool is a function with a typed schema, error handling, an installer, and a test surface. Writing a good tool takes an afternoon. Writing the rule it replaces takes ten minutes.

The afternoon pays back, but only for the right tools. The rules to convert are the ones with high cost (the rule has been violated multiple times, each violation has a real cost) and high check-ability (the check is mechanical). The rules to leave alone are the ones with low cost or low check-ability. They’re cheaper to keep as rules and live with the occasional miss.

There’s a second tradeoff. Tools accumulate. Every tool added to the agent’s surface is another option it considers. A tool list that grows to fifty capabilities slows the agent’s decision-making and pushes the relevant tools further out of immediate reach. Tools, like rules, deserve their own audit cadence in the same way rule-rot eats at rules, tool sprawl eats at agents.

Keep the load-bearing tools. Delete the ones the agent never calls. Watch the same pattern as the rules: which tools earn their slot, which sit unused.

If rules are the first layer and sensors are the second, tools are the third. Each layer does work the others can’t.

A working harness uses all three. The rule says *do X*. The tool makes *X* a callable capability. The sensor verifies *X* happened. Each layer has a different cost and a different power. The mistake is to load everything into one layer because that layer is comfortable.

The teams I see with rules-only harnesses ship the same bugs over and over with great-sounding documentation. The teams I see with tools-only harnesses ship code that compiles but doesn’t match the team’s conventions. The teams I see thriving have a balance — rules for the guidance, tools for the capability, sensors for the gate.

The most powerful arrangement is when a rule references a tool by name. The rule loads into context, communicates the intent, and tells the agent which tool to reach for. The tool, when called, produces the deterministic answer the rule was hand-waving toward.

```
## Verifying schema changesBefore writing a migration that adds, drops, or alters a column, call`db_schema_dump` to get the current schema. Confirm the columns you referenceexist and are typed as you expect. Cite the schema dump in your reasoning sothe reviewer can see you verified.
```

That’s a rule paired with a tool. The rule does what rules do: it explains the intent, names the situations it applies to, and tells the agent how to handle them. The tool does what tools do: it provides the actual capability — fetching the current schema, in a structured form, deterministically.

The pair is stronger than either piece alone. The rule alone would be ignored on long contexts. The tool alone wouldn’t know when to be called. Together, they cover the case.

Pick a rule from your CLAUDE.md that names a check. *Always run the linter. Verify the schema first. Check the test coverage.* Anything that says *always X* where *X* is mechanical.

Look at whether there’s an MCP server that already does *X*. If so, install it, expose the tool to the agent, and rewrite the rule to name the tool. If not, look at the size of *X*. If it’s small (a shell command, a script), write a one-tool MCP server that exposes the script. Most agent harnesses make this surprisingly easy now.

Ship the tool. Update the rule to name the tool. Watch the next ten agent runs. The rule that was complied with 80% of the time becomes 100% almost overnight, because the agent isn’t being asked to remember anymore. The capability is right there.

Rules tell. Tools enforce. The harness with both is the harness the agent doesn’t drift out of.
