The Loadout Pattern: Handing the Wheel to an Autonomous LLM

wpnews.pro

Conventional automation executes a procedure — code runs a fixed sequence of steps and decides

nothing; same input, same path, every time. The loadout pattern keeps the steps but moves the

deciding to the model. At each step the brain — an autonomous LLM — judges: what matters,

which tool to reach for, whether to act at all. It's handed a purpose and the latitude to pursue

it, and it drives — choosing its own tools as it goes. Code executes; the brain decides. Those

tools come as a loadout — a curated, self-describing set drawn from a shared toolbox — and

the brain is observed at the interface it calls, not by the side effects it leaves behind. The

model is the driver; your system is the suit it wears. Everything below is how to build that.

Most LLM integrations bolt a model

intoyour code. This is about the opposite: letting the

modeldriveyour system — equipping itself, on its own initiative, with aloadout: the

curated, self-describing set of tools it picks for each mission. The system stops being the

program that calls an LLM, and becomes thesuitthe LLM wears.

Audience: engineers building agentic/automation systems. There's code, and there's a bit of philosophy — because the philosophy is what makes the code shaped the way it is.

Two words, kept distinct (the whole post hinges on this):

atoolbox(or catalog) iseverytool you own — the whole armory.

Aloadoutis the curated subset a routine equipsfor one mission— what it actually suits

up with. The entire MCP server is a toolbox; a loadout is the handful of tools one routine is

handed at wake.

In a typical LLM integration the model lives inside your process. Your code calls it:

answer = agent.invoke({"input": "What changed in the market overnight?"})

This is great for human-triggered work: a person asks, the system fetches and answers. The

human is the caller; nothing happens until they show up. The LLM is a component — a function

your program calls and pays per token to use.

This post is about the other mode: the LLM doing the work on its own initiative. A routine wakes

on a schedule and gets on with it — digesting overnight news every hour, posting a morning briefing,

watching a queue, reconciling a ledger. No one asked; the routine is its own caller. Wake the model

on a cron — say, a headless Claude Code session every hour — and it is no longer a component inside

your program. It's outside, periodically taking the wheel and deciding what to do.

The line that matters isn't human-vs-cron — and it isn't even steps-or-no-steps. It's executing versus deciding: a script runs its steps and decides nothing, while the brain — even when it

That inversion changes what your system should be.

Three layers, and it matters which is which:

Here's the leverage that falls out of this: you don't hand-author JARVIS's intelligence. It

comes from the model — and it improves when you swap in a better model, not when you write more

code. What you build is the suit — what the brain can sense, remember, and do. So the central

question of the whole system becomes: how do we equip the brain well — give it the right loadout — and let it reach for the right tool at the right moment?

When you first wire a cron-woken routine, you write a prompt ("skill") that mixes two very

different things: the mission (what to judge, the actual work) and the mechanics (raw

curl

, database queries, hardcoded IDs). A real before-state:

1. Query Mongo for new headlines since the watermark:
   docker exec db mongosh app --eval 'db.news.find({publishedAt:{$gt: ...}})...'
2. Decide which are new stories vs updates vs noise.  ← the actual mission
3. Post the briefing:
   curl -X POST http://localhost:9000/notify -d '{"type":"SIGNAL", ...}'
   Then create a Notion page: data_source_id "<your-notion-data-source>", icon "📰", ...

Two problems compound. First, the mission (step 2 — judgment) is drowned in plumbing. Second,

every other routine that needs to "post a notification" re-describes that same curl

in its own

prompt. Change the notification URL and you edit five skills. The mechanics are copy-pasted prose.

Split the system along the seam between interface and implementation.

1. Tools are named capabilities — a stable name over a swappable implementation. Most are small,

dumb, independent scripts, but the name is the only thing the brain depends on; what sits behind

it is free to vary. Usually it wraps mechanics (a curl

, a DB query, a stubbed no-op, a different

backend tomorrow). But a tool can just as well hand off to another agent — a sub-brain with its

own loadout — or trigger the next task in a pipeline. To the brain it's all the same: a name it

can reach for. So a tool is sometimes an interface over mechanics, and sometimes the next move —

another agent, or the start of the next step. notify

sends a notification; read_news

reads. They

don't know about each other. Together, all of them are your toolbox (the catalog).

#!/usr/bin/env bash
set -euo pipefail
[ "${1:-}" = "--describe" ] && { echo "notify|action|send a notification"; exit 0; }
TYPE="$1"; TITLE="$2"; MSG="$3"
payload="$(jq -n --arg t "$TYPE" --arg ti "$TITLE" --arg m "$MSG" '{type:$t,title:$ti,message:$m}')"
curl -s -X POST "${NOTIFY_URL:-http://localhost:9000/notify}" \
     -H 'Content-Type: application/json' -d "$payload"

2. Tools describe themselves. One line, --describe

, is the single source of truth for what

the tool is. Not the skill, not a wiki — the tool.

3. A loadout assembler hands the brain its kit. Given a list of tool

#!/usr/bin/env bash
set -euo pipefail
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
echo "🧰 loadout for this mission"
for t in "$@"; do
  IFS='|' read -r name kind desc <<<"$(bash "$DIR/$t.sh" --describe)"
  echo "  - $name ($kind): $desc"
done

4. The skill becomes mission only. It states what to do and names its loadout. The

## Loadout — download at start
bash tools/loadout.sh read_news write_story notify publish_notion

## Mission
Turn new headlines into a running ledger of stories: skip repeats, extend ongoing
stories, open new ones, ignore noise. Each morning, post a briefing from the ledger.

5. The brain thinks for itself — tools don't auto-chain. Keep notify

and publish_notion

separate; do not make "writing to Notion" secretly also send a notification. The moment you fuse

two tools in the plumbing you've frozen a policy — you can no longer publish quietly, or notify

without publishing. Leave the tools independent and let the brain reason about whether to call one,

the other, or both. The thinking is the brain's job; the wiring must not pre-decide it.

From the model's point of view, this is the whole win. When the routine wakes, it receives two

cleanly separated things: a mission — what to accomplish and how to judge it — and a

loadout — the named capabilities it is allowed to use. It never has to excavate the how (a

URL, a query, an ID) out of the what; the mechanics are simply not in its field of view, leaving

only the decision and the set of moves available to make it. The skill carries judgment (which

changes often); the toolbox carries capability (stable, shared); a loadout is just the names a

routine picks from it. A new routine lists tool names and gets their descriptions for free — change

a URL and you edit one tool, not five prompts.

Side effects are not proof. A notification arriving does not establish that the model invoked the

tool, and a tool whose implementation is a no-op stub produces no side effect at all even when the

model used it correctly. Verifying behavior therefore means observing the interface — the

moment a tool is called — separately from what its implementation did.

Each tool logs at that boundary:

tlog() {  # tlog <event> [detail]   event: INVOKED | OK | DRY | ERR
  printf '%s | %-12s | %-7s | %s\n' \
    "$(date -u +%Y-%m-%dT%H:%M:%SZ)" "$(basename "$0" .sh)" "$1" "${*:2}" \
    >> "$LOG_FILE"
}

The log separates two questions that side effects conflate:

... | notify | INVOKED | SIGNAL | Morning briefing   # the interface was called
... | notify | OK      | SIGNAL HTTP 200             # the implementation sent it
... | notify | DRY     | SIGNAL                      # called, but did not send (DRY_RUN)

INVOKED

records that the model used the tool, independent of any outcome; OK

/DRY

/ERR

records what the implementation did. Because the model depends on the interface rather than the

implementation, the same routine can run in a shadow mode — where notify

only logs and never

sends — with no change in the model's behavior. The boundary log is also the reliable way to audit

a past run: it records what executed, not merely what the skill instructed.

This is a pattern, not a framework — and that's its honest limit: nothing enforces it at runtime. There's no base class, no inversion of control, nothing that

Go back to the suit. You upgrade the brain by adopting a better model — that's not code you write,

it's a model you swap in. Your day-to-day engineering goes into the equipment: what the brain can

discover, reach for, and be observed using. And because the brain depends on interfaces — a loadout

of named tools — the suit is model-agnostic: change the model and the same loadout still fits. A

self-describing, observable loadout is precisely how the brain takes the wheel: it wakes,

downloads the tools it's allowed, sees what it can do, and acts — and you can watch it do so at the

interface, not by guessing from side effects. The system stops being a program that occasionally

calls a model, and becomes a suit a capable model wears.

--describe

line and a boundary log (INVOKED

OK/DRY/ERR

). Side-effecting tools support a DRY_RUN

.Runnable examples are in examples/.

source & further reading

dev.to — original article Want AI Agents That Don't Spill Secrets? Don't Give Them Secrets I asked four AI models which database to use in 2026. Neon already won. Four challengers are invisible. ARK Trust: The Missing Reliability Layer for AI Agents

The Loadout Pattern: Handing the Wheel to an Autonomous LLM

Run your AI side-project on zahid.host