If you are still SSH-ing into four boxes to confirm last night's dump actually
ran, automated database backups across servers are exactly the kind of chore
that should be running on autopilot β not eating your evenings. In this guide we
wire up postgres backup automation for a small fleet (one primary, two
streaming replicas, an offsite backup box) and drive the whole thing from a
single AI interface using the MCP server
remote-agents. No agent
In short:every database host runs a lightweight agent connected to an
encrypted relay room. Your AI assistant (Claude or opencode) sees the whole
fleet as one machine and calls tools likeschedule_add
,fleet_exec
,
file_stat
,fleet_git
,send_file
, andset_mode
. Cron jobs liveon the, so a nightly
hostpg_dump
keeps running even if the relay link drops.
Payloads are end-to-end encrypted (AES-GCM-256) β the relay forwards
ciphertext blind.
Let's take a realistic small-SaaS database tier: a Postgres primary, two
streaming replicas for read scaling and failover, and an offsite box that only
stores compressed dumps. We'll also nod to a Windows SQL Server host at the end
to show the cross-platform story.
| Host | Role | Tag | Stack |
|---|---|---|---|
db-primary |
|||
| Postgres primary (writes) | db,primary |
||
| PostgreSQL 16, Linux | |||
db-replica-1 |
|||
| Streaming replica | db,replica |
||
| PostgreSQL 16, Linux | |||
db-replica-2 |
|||
| Streaming replica | db,replica |
||
| PostgreSQL 16, Linux | |||
backup-box |
|||
| Offsite dump target | backup |
||
| rsync, gzip, Linux | |||
sql-win |
|||
| SQL Server (optional) | db,windows |
||
| SQL Server 2022, Windows |
All agents join one relay room (say dbfleet
). Tags let you address a group
in a single call: target="db,replica"
hits both replicas, target="os:linux"
hits every Linux box, and target="all"
sweeps the whole fleet. Note that
multi-tag targets match a host carrying either tag, so db,primary
resolves
to every database host.
βββββββββββββββ AI (Claude / opencode) βββββββββββββββ
β remote-agents (MCP, stdio) β
βββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β wss:// (E2E-encrypted)
βββββββββ΄βββββββββ relay (CF Worker or self-hosted)
β room=dbfleet β
βββββββββββββ¬ββββββββ΄βββββ¬βββββββββββββ¬ββββββββββββββ
db-primary db-replica-1 db-replica-2 backup-box sql-win
(db,primary) (db,replica) (db,replica) (backup) (db,windows)
The relay is interchangeable: use the hosted Cloudflare Worker, or run your own
Rust relay with remote-agents-relay --bind 0.0.0.0:8080
and point agents at
ws://your-host:8080
. Either way the relay only ever sees encrypted frames.
On each host you install the package once and start the agent with the right
tags. For 24/7 database servers you'll want the background service form so the
agent survives reboots:
npm i -g remote-agents
remote-agents run --relay wss://<relay> --room dbfleet --token <secret> \
--name db-primary --tags db,primary
remote-agents run ... --name db-replica-1 --tags db,replica
remote-agents run ... --name db-replica-2 --tags db,replica
remote-agents run ... --name backup-box --tags backup
remote-agents install --relay wss://<relay> --room dbfleet --token <secret> \
--name db-primary --tags db,primary
Then confirm the whole fleet is online. Ask your AI:
"List the agents in the room."
Under the hood that calls ** list_agents**, which returns each peer's OS family,
update_available
flag when a newer agentAlso read: Run a security audit across your whole fleet
| # Recipe | What it does |
|---|
- Plan-mode look |
Tag db hosts, set them to read-only
plan, and inspect Postgres safely before touching anything. | - Nightly pg_dump |
schedule_adda0 3 * * * *host-local cron that dumps + gzips and prunes anything older than 14 days. | - Verify freshness |
fleet_exec+file_statto confirm the dump exists, its size, and its mtime on every host at once. | - Replica migration |
fleet_gitpull migrations +fleet_execacrossdb,replicawith per-host results so a failed node is obvious. | - Lag / health check |
fleet_exec target="db,replica"runningpg_stat_replicationto catch lag and broken streaming. | - Read prod config |
read_filea production.envinplanmode β zero write risk. | - Ship a dump |
send_filea dump from primary tobackup-boxover a direct UDP channel, SHA-256 verified. |
Database fleets are where "I'll just SSH in real quick" goes to die. A single
forgotten retention prune fills a disk; a silent pg_dump
failure isn't noticed
until the day you need the dump; a schema migration applied to the primary but
not the replicas causes mysterious read errors hours later. Here's where driving
the fleet through one AI interface pays off:
schedule_add
installs the cron 0 3 * * * *
dump runs even if the relay link
is down, your laptop is closed, or the AI session has long ended. The schedule
is not tied to your connection.fleet_exec
and file_stat
give you the dump
size and mtime on all four boxes in a single round trip instead of four SSH
sessions. One failing host does not sink the batch β it just shows up red in
the aggregated result.fleet_git
fleet_exec
against the
db,replica
tag apply the same migration to both replicas and report each one
separately, so a half-applied change is impossible to miss.plan
mode makes a host read-only, so you can pull
a prod .env
or run pg_stat_replication
during an incident with no chance of
fat-fingering a write.target="os:linux"
hits the Postgres boxes; a
Windows SQL Server host joins the same room and answers os:windows
targeting
with a sqlcmd
backup instead of pg_dump
.This is not a replacement for a declarative backup orchestrator or a managed
RDS-style service. It shines for self-hosted databases, small-to-mid fleets,
dev/staging tiers, and the operational glue around backups that nobody wants to
maintain in YAML.
Before automating anything, look first and touch nothing. Set every database
host to ** plan** mode β read-only
read_file
, git_status
, and safe exec
set_mode target="db,replica" mode=plan
set_mode agent_id=db-primary mode=plan
fleet_exec target="db,primary" command="systemctl is-active postgresql && psql -tAc 'SELECT version();'"
fleet_exec target="db,primary" command="du -sh /var/lib/postgresql/16/main; df -h /var"
list_dir agent_id=backup-box path=/srv/backups
Tip:plan
mode still allows a read-onlyexec
, sopsql -tAc 'SELECT ...'
anddu -sh
work fine. The moment a command would write, the agent rejects it.
Start every incident here β you literally cannot break prod fromplan
.
The aggregated reply comes back per host: you'll see db-primary
reporting
active
and a 240 GB data directory, while a replica that's catching up might
report differently. That per-host shape is the whole point β no guessing which
box you're looking at.
This is the core of scheduled pg_dump automation. We install a host-local
cron on db-primary
that dumps the database, gzips it, and prunes anything
older than 14 days. Because schedule_add
writes the cron to the host, it
keeps firing at 03:00 every night whether or not anyone is connected.
Remember the cron is a 6-field spec β sec min hour day month dow
β so
"3 AM nightly" is 0 0 3 * * *
.
edit
so the agent may create the dump file and the wrapper script.
pg_dump -Fc -U postgres app_prod \
| gzip > /srv/backups/app_prod-$(date +\%F).sql.gz \
&& find /srv/backups -name 'app_prod-*.sql.gz' -mtime +14 -delete
set_mode agent_id=db-primary mode=edit
schedule_add agent_id=db-primary name=nightly-pgdump \
cron="0 0 3 * * *" \
command="pg_dump -Fc -U postgres app_prod | gzip > /srv/backups/app_prod-$(date +%F).sql.gz && find /srv/backups -name 'app_prod-*.sql.gz' -mtime +14 -delete"
schedule_list agent_id=db-primary
The -Fc
custom format gives you a compressed, pg_restore
-friendly archive; a
240 GB cluster typically lands around a 30β45 GB gzipped dump depending on how
much of it is indexes and TOAST. The -mtime +14 -delete
clause keeps roughly
two weeks of dumps and nothing more, so the disk doesn't quietly fill.
Note:for amysqldump cronthe only thing that changes is the command β
mysqldump --single-transaction app_prod | gzip > ...
β wrapped in the exact
sameschedule_add
. And for the Windows SQL Server host you'd schedule a
sqlcmd -Q "BACKUP DATABASE ..."
instead. The scheduler, retention, and
verification flow are identical across all of them.
To remove the schedule later (say you've migrated to WAL archiving):
schedule_remove agent_id=db-primary name=nightly-pgdump
A backup you never check is a backup you don't have. This recipe answers the only
question that matters at 9 AM: did last night's dump actually run, and is it the right size, on every host? We combine
fleet_exec
to find the newest dump withfile_stat
to read its exact size and mtime.file_stat
that file for an authoritative size and modification time.
fleet_exec target="db,primary" \
command="ls -t /srv/backups/app_prod-*.sql.gz 2>/dev/null | head -n1"
file_stat agent_id=db-primary path=/srv/backups/app_prod-2026-06-18.sql.gz
fleet_exec target="db,primary" \
command="find /srv/backups -name 'app_prod-*.sql.gz' -mmin -1560 -printf '%p %s bytes\n' || echo NO_FRESH_BACKUP"
file_stat
returns the size and mtime directly, which is more trustworthy than
parsing ls
output. The -mmin -1560
window (26 hours) gives the 03:00 job some
slack β anything that didn't write a fresh file in that window prints
NO_FRESH_BACKUP
, and because results are aggregated per host, a replica
that skipped its dump stands out instantly while the healthy boxes report a clean
~42 GB
file.
Important:size is your early-warning signal. A dump that suddenly drops
from 42 GB to 200 KB almost always meanspg_dump
errored out mid-run (bad
credentials, a dropped connection, a full disk) and gzipped only the error.
Watching size over time catches silent failures that a green exit code hides.
You can fold this into a once-a-day self-check by wrapping the find
in another
schedule_add
that appends to a log β same host-local cron pattern as Recipe 2.
Also read: Build a cross-platform CI test farm
Replicas in database fleet management drift the moment a migration lands on
one box but not another. With fleet_git
to pull the migration repo and
fleet_exec
to apply it across the db,replica
tag, both replicas move in
lockstep and you get a per-host verdict.
On true streaming replicas you don't run DDL directly (they're read-only and
replay WAL from the primary). This recipe fits the common real-world setup where
"replica" hosts also run migration tooling against a logical target, or where you
roll out an out-of-band maintenance script. Adjust to your topology.
fleet_git target="db,replica" op=pull repo=/srv/migrations remote=origin branch=main
fleet_exec target="db,replica" \
command="cd /srv/migrations && ./run_migrations.sh 2>&1 | tail -n 5 || echo MIGRATION_FAILED"
fleet_exec target="db,replica" \
command="psql -tAc 'SELECT max(version) FROM schema_migrations;'"
The payoff is the per-host aggregation: if db-replica-2
returns
MIGRATION_FAILED
while db-replica-1
reports the new version, you know exactly
which node to fix β no scrolling through interleaved SSH output trying to figure
out whose error you're reading. One failing replica does not block the batch from
finishing on the healthy one.
Tip:put the hosts inedit
mode (notbypass
) for migrations. Inedit
,
the agent auto-creates abackup before any overwrite, so a botched
write_file
to a config or script leaves you the original. Drop back toplan
the moment you're done.
The single most useful replica maintenance check is replication lag: how far
behind the primary each replica is. From the primary you query
pg_stat_replication
; from each replica you check pg_last_wal_replay_lag
.
Both are one fleet_exec
away.
fleet_exec target="db,primary" \
command="psql -xtAc \"SELECT client_addr, state, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;\""
fleet_exec target="db,replica" \
command="psql -tAc 'SELECT now() - pg_last_xact_replay_timestamp() AS replay_lag;'"
fleet_exec target="db,replica" \
command="psql -tAc 'SELECT CASE WHEN pg_is_in_recovery() THEN 1 ELSE 0 END;' | grep -q 1 && echo OK_RECOVERY || echo NOT_A_REPLICA"
Run from plan
mode β these are all read-only queries, so there's zero risk even
on a busy primary. The first query, sent to db-primary
, shows you both replicas
as rows with their state
(streaming
is what you want) and lag_bytes
. The
second, fanned out across the db,replica
tag, returns each replica's replay lag
side by side. A replica that has fallen out of streaming shows up either as a
missing row in the primary's view or a ballooning replay_lag
.
Note:a healthy LAN replica usually sits under a few hundred milliseconds
of replay lag. A sudden jump to tens of seconds (or a replica vanishing from
pg_stat_replication
entirely) means streaming broke β often a WAL retention
or network issue. Catching it here, across both replicas in one call, beats
discovering it when a read query returns stale data.
Sometimes you just need to confirm a connection string or a backup target path
without any chance of editing it. With the host in ** plan** mode,
read_file
plan
mode.read_file
the config.exec
.
set_mode agent_id=db-primary mode=plan
read_file agent_id=db-primary path=/srv/app/.env
exec agent_id=db-primary command="grep -E '^(PGHOST|PGDATABASE|BACKUP_DIR)=' /srv/app/.env"
Because the agent is read-only, even if you (or the AI) followed up with a
write_file
, it would be rejected. On top of that, a hard deny-list β
covering paths like /etc/shadow
and /boot
β applies on every host in
every mode, including bypass
, so the truly sensitive system files are never
readable or writable through the agent regardless of how you set things up.
Important:plan
mode is the right default for production database hosts.
Keep them inplan
day to day and only flip a single host toedit
for the
specific window you're changing something β then flip it straight back. It
turns "I'll be careful" into "I literally can't write right now."
Finally, get the dump off the primary and onto the offsite backup-box
.
send_file
streams a file hostβhost over a direct UDP data channel (opened on
demand, with automatic relay fallback) and verifies it end-to-end with a
SHA-256 checksum. Because the receiving side performs a write, the
destination host needs edit
or bypass
.
backup-box
in edit
so it can write the incoming file.send_file
the latest dump from db-primary
to backup-box
.file_stat
on the receiver.
set_mode agent_id=backup-box mode=edit
send_file agent_id=db-primary \
path=/srv/backups/app_prod-2026-06-18.sql.gz \
to=backup-box \
dest=/srv/offsite/app_prod-2026-06-18.sql.gz
file_stat agent_id=backup-box path=/srv/offsite/app_prod-2026-06-18.sql.gz
The transfer goes peer-to-peer where the network allows, so a 42 GB dump doesn't
have to round-trip through the relay β and if a direct UDP path can't be
established, it transparently falls back to the relay so the copy still
completes. The SHA-256 check means you find out about a corrupted transfer
immediately, not when a restore fails three weeks later. If you'd rather pull
from the receiving side, transfer_get
is the mirror-image tool.
Tip:in the browser fleet-chat panel there's a Files view that does the same
move with a live progress bar, plus a binary-safe chunked download of any file
through the relay. Handy when you want to eyeball a dump's size or grab one to
your laptop without dropping to the CLI.
Set backup-box
back to plan
when the copy is done, so the offsite store is
read-only at rest:
set_mode agent_id=backup-box mode=plan
It's the same idea, but the cron is installed and managed through one AI
interface across the whole fleet, and schedule_add
registers it on the host
so it survives relay outages and reboots. The difference is operational: you add,
list, verify, and remove backup schedules on four boxes from one place, and you
get per-host confirmation that each dump actually ran, instead of trusting four
independent crontabs you never look at.
Yes. schedule_add
writes a host-local cron that lives entirely on the database
server. The relay link and your AI session are only needed to create, list, or remove the schedule β not to run it. Your
0 0 3 * * *
dump fires at 03:00Start every host in plan
mode, which is read-only β you can run
pg_stat_replication
, read_file
a .env
, and check dump freshness with zero
write risk. Flip a single host to edit
only for the specific change you're
making (edit
auto-backs-up before overwriting), and a hard deny-list on paths
like /etc/shadow
and /boot
applies even in bypass
. All command payloads and
results are end-to-end encrypted; the relay forwards ciphertext and never sees
your data or keys.
Yes β the workflow is database-agnostic. Swap the command inside schedule_add
:
mysqldump --single-transaction ... | gzip
for MySQL, or
sqlcmd -Q "BACKUP DATABASE ..."
on a Windows host targeted via os:windows
.
The scheduling, 14-day retention prune, freshness verification, and SHA-256
transfer all stay identical.
Run ** fleet_update_check**, which tells you which idle hosts have a newer agent
npm i -g remote-agents@latest
on those hosts. ItOne MCP interface, five hosts, zero manual SSH sessions: a nightly scheduled pg_dump with 14-day retention runs host-local on the primary, freshness is
file_stat
, replicas get migrations and lag checksplan
mode, and the dump lands offsiteInstall:
npm i -g remote-agents
β
[package on npm]Β·
[source and docs]