# The Loadout Pattern: Handing the Wheel to an Autonomous LLM

> Source: <https://dev.to/bighaeil/the-loadout-pattern-handing-the-wheel-to-an-autonomous-llm-29lj>
> Published: 2026-06-29 07:32:54+00:00

Conventional automation **executes** a procedure — code runs a fixed sequence of steps and decides

nothing; same input, same path, every time. The loadout pattern keeps the steps but moves the

*deciding* to the model. At each step the **brain** — an autonomous LLM — **judges**: what matters,

which tool to reach for, whether to act at all. It's handed a **purpose** and the latitude to pursue

it, and it *drives* — choosing its own tools as it goes. **Code executes; the brain decides.** Those

tools come as a **loadout** — a curated, self-describing set drawn from a shared **toolbox** — and

the brain is observed at the **interface** it calls, not by the side effects it leaves behind. The

model is the driver; your system is the suit it wears. Everything below is how to build that.

Most LLM integrations bolt a model

intoyour code. This is about the opposite: letting the

modeldriveyour system — equipping itself, on its own initiative, with aloadout: the

curated, self-describing set of tools it picks for each mission. The system stops being the

program that calls an LLM, and becomes thesuitthe LLM wears.

*Audience: engineers building agentic/automation systems. There's code, and there's a bit of
philosophy — because the philosophy is what makes the code shaped the way it is.*

Two words, kept distinct (the whole post hinges on this):

atoolbox(or catalog) iseverytool you own — the whole armory.

Aloadoutis the curated subset a routine equipsfor one mission— what it actually suits

up with. The entire MCP server is a toolbox; a loadout is the handful of tools one routine is

handed at wake.

In a typical LLM integration the model lives **inside** your process. Your code calls it:

```
answer = agent.invoke({"input": "What changed in the market overnight?"})
```

This is great for **human-triggered** work: a person asks, the system fetches and answers. The

human is the caller; nothing happens until they show up. The LLM is a *component* — a function

your program calls and pays per token to use.

This post is about the other mode: **the LLM doing the work on its own initiative.** A routine wakes

on a schedule and gets on with it — digesting overnight news every hour, posting a morning briefing,

watching a queue, reconciling a ledger. No one asked; the routine is its own caller. Wake the model

on a cron — say, a headless Claude Code session every hour — and it is no longer a component inside

your program. It's *outside*, periodically taking the wheel and deciding what to do.

The line that matters isn't human-vs-cron — and it isn't even steps-or-no-steps. It's **executing
versus deciding**: a script runs its steps and decides nothing, while the brain — even when it

That inversion changes what your system should be.

Three layers, and it matters which is which:

Here's the leverage that falls out of this: **you don't hand-author JARVIS's intelligence.** It

comes from the model — and it improves when you swap in a better model, not when you write more

code. What you *build* is the *suit* — what the brain can sense, remember, and do. So the central

question of the whole system becomes: *how do we equip the brain well — give it the right loadout —
and let it reach for the right tool at the right moment?*

When you first wire a cron-woken routine, you write a prompt ("skill") that mixes two very

different things: the **mission** (what to judge, the actual work) and the **mechanics** (raw

`curl`

, database queries, hardcoded IDs). A real before-state:

```
# news-digest skill (before)
1. Query Mongo for new headlines since the watermark:
   docker exec db mongosh app --eval 'db.news.find({publishedAt:{$gt: ...}})...'
2. Decide which are new stories vs updates vs noise.  ← the actual mission
3. Post the briefing:
   curl -X POST http://localhost:9000/notify -d '{"type":"SIGNAL", ...}'
   Then create a Notion page: data_source_id "<your-notion-data-source>", icon "📰", ...
```

Two problems compound. First, the mission (step 2 — judgment) is drowned in plumbing. Second,

every *other* routine that needs to "post a notification" re-describes that same `curl`

in its own

prompt. Change the notification URL and you edit five skills. The mechanics are copy-pasted prose.

Split the system along the seam between **interface** and **implementation**.

**1. Tools are named capabilities — a stable name over a swappable implementation.** Most are small,

dumb, independent scripts, but the *name* is the only thing the brain depends on; what sits behind

it is free to vary. Usually it wraps mechanics (a `curl`

, a DB query, a stubbed no-op, a different

backend tomorrow). But a tool can just as well **hand off to another agent** — a sub-brain with its

own loadout — or **trigger the next task** in a pipeline. To the brain it's all the same: a name it

can reach for. So a tool is sometimes an interface over mechanics, and sometimes the *next move* —

another agent, or the start of the next step. `notify`

sends a notification; `read_news`

reads. They

don't know about each other. Together, all of them are your **toolbox** (the catalog).

``` bash
#!/usr/bin/env bash
# notify.sh — send a notification (hides the URL/payload mechanics)
set -euo pipefail
[ "${1:-}" = "--describe" ] && { echo "notify|action|send a notification"; exit 0; }
TYPE="$1"; TITLE="$2"; MSG="$3"
payload="$(jq -n --arg t "$TYPE" --arg ti "$TITLE" --arg m "$MSG" '{type:$t,title:$ti,message:$m}')"
curl -s -X POST "${NOTIFY_URL:-http://localhost:9000/notify}" \
     -H 'Content-Type: application/json' -d "$payload"
```

**2. Tools describe themselves.** One line, `--describe`

, is the single source of truth for what

the tool is. Not the skill, not a wiki — the tool.

**3. A loadout assembler hands the brain its kit.** Given a list of tool

``` bash
#!/usr/bin/env bash
# loadout.sh <tool> [tool...] — print the self-descriptions of the named tools (the loadout)
set -euo pipefail
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "🧰 loadout for this mission"
for t in "$@"; do
  IFS='|' read -r name kind desc <<<"$(bash "$DIR/$t.sh" --describe)"
  echo "  - $name ($kind): $desc"
done
```

**4. The skill becomes mission only.** It states what to do and names its loadout. The

```
# news-digest skill (after)
## Loadout — download at start
bash tools/loadout.sh read_news write_story notify publish_notion

## Mission
Turn new headlines into a running ledger of stories: skip repeats, extend ongoing
stories, open new ones, ignore noise. Each morning, post a briefing from the ledger.
```

**5. The brain thinks for itself — tools don't auto-chain.** Keep `notify`

and `publish_notion`

separate; do *not* make "writing to Notion" secretly also send a notification. The moment you fuse

two tools in the plumbing you've frozen a policy — you can no longer publish quietly, or notify

without publishing. Leave the tools independent and let the *brain* reason about whether to call one,

the other, or both. The thinking is the brain's job; the wiring must not pre-decide it.

**From the model's point of view, this is the whole win.** When the routine wakes, it receives two

cleanly separated things: a **mission** — what to accomplish and how to judge it — and a

**loadout** — the named capabilities it is allowed to use. It never has to excavate the *how* (a

URL, a query, an ID) out of the *what*; the mechanics are simply not in its field of view, leaving

only the decision and the set of moves available to make it. The skill carries judgment (which

changes often); the toolbox carries capability (stable, shared); a loadout is just the names a

routine picks from it. A new routine lists tool names and gets their descriptions for free — change

a URL and you edit one tool, not five prompts.

Side effects are not proof. A notification arriving does not establish that the model invoked the

tool, and a tool whose implementation is a no-op stub produces no side effect at all even when the

model used it correctly. Verifying behavior therefore means observing the **interface** — the

moment a tool is called — separately from what its implementation did.

Each tool logs at that boundary:

```
# _log.sh (sourced by every tool)
tlog() {  # tlog <event> [detail]   event: INVOKED | OK | DRY | ERR
  printf '%s | %-12s | %-7s | %s\n' \
    "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$(basename "$0" .sh)" "$1" "${*:2}" \
    >> "$LOG_FILE"
}
```

The log separates two questions that side effects conflate:

```
... | notify | INVOKED | SIGNAL | Morning briefing   # the interface was called
... | notify | OK      | SIGNAL HTTP 200             # the implementation sent it
... | notify | DRY     | SIGNAL                      # called, but did not send (DRY_RUN)
```

`INVOKED`

records that the model used the tool, independent of any outcome; `OK`

/`DRY`

/`ERR`

records what the implementation did. Because the model depends on the interface rather than the

implementation, the same routine can run in a **shadow mode** — where `notify`

only logs and never

sends — with no change in the model's behavior. The boundary log is also the reliable way to audit

a past run: it records what executed, not merely what the skill instructed.

This is a pattern, not a framework — and that's its honest limit: **nothing enforces it at
runtime.** There's no base class, no inversion of control, nothing that

Go back to the suit. You upgrade the brain by adopting a better model — that's not code you write,

it's a model you swap in. Your day-to-day engineering goes into the equipment: what the brain can

discover, reach for, and be observed using. And because the brain depends on interfaces — a loadout

of named tools — the suit is model-agnostic: change the model and the same loadout still fits. A

self-describing, observable loadout is precisely how the brain *takes the wheel*: it wakes,

downloads the tools it's allowed, sees what it can do, and acts — and you can watch it do so at the

interface, not by guessing from side effects. The system stops being a program that occasionally

calls a model, and becomes a suit a capable model wears.

`--describe`

line and a boundary log (`INVOKED`

+ `OK/DRY/ERR`

). Side-effecting tools support a
`DRY_RUN`

.Runnable examples are in [ examples/](https://github.com/bighaeil/agent-loadout-pattern/tree/main/examples).
