Automate Database Backups Across a Server Fleet with AI: 7 Recipes for 2026

wpnews.pro

If you are still SSH-ing into four boxes to confirm last night's dump actually

ran, automated database backups across servers are exactly the kind of chore

that should be running on autopilot — not eating your evenings. In this guide we

wire up postgres backup automation for a small fleet (one primary, two

streaming replicas, an offsite backup box) and drive the whole thing from a

single AI interface using the MCP server

remote-agents. No agent

In short:every database host runs a lightweight agent connected to an

encrypted relay room. Your AI assistant (Claude or opencode) sees the whole

fleet as one machine and calls tools likeschedule_add

,fleet_exec

,

file_stat

,fleet_git

,send_file

, andset_mode

. Cron jobs liveon the, so a nightly

hostpg_dump

keeps running even if the relay link drops.

Payloads are end-to-end encrypted (AES-GCM-256) — the relay forwards

ciphertext blind.

Let's take a realistic small-SaaS database tier: a Postgres primary, two

streaming replicas for read scaling and failover, and an offsite box that only

stores compressed dumps. We'll also nod to a Windows SQL Server host at the end

to show the cross-platform story.

Host	Role	Tag	Stack
`db-primary`
Postgres primary (writes)	`db,primary`
PostgreSQL 16, Linux
`db-replica-1`
Streaming replica	`db,replica`
PostgreSQL 16, Linux
`db-replica-2`
Streaming replica	`db,replica`
PostgreSQL 16, Linux
`backup-box`
Offsite dump target	`backup`
rsync, gzip, Linux
`sql-win`
SQL Server (optional)	`db,windows`
SQL Server 2022, Windows

All agents join one relay room (say dbfleet

). Tags let you address a group

in a single call: target="db,replica"

hits both replicas, target="os:linux"

hits every Linux box, and target="all"

sweeps the whole fleet. Note that

multi-tag targets match a host carrying either tag, so db,primary

resolves

to every database host.

            ┌────────────── AI (Claude / opencode) ──────────────┐
            │            remote-agents (MCP, stdio)               │
            └───────────────────────┬────────────────────────────┘
                                    │ wss:// (E2E-encrypted)
                            ┌───────┴────────┐  relay (CF Worker or self-hosted)
                            │   room=dbfleet  │
        ┌───────────┬───────┴────┬────────────┬─────────────┐
   db-primary   db-replica-1  db-replica-2  backup-box   sql-win
  (db,primary)  (db,replica)  (db,replica)  (backup)    (db,windows)

The relay is interchangeable: use the hosted Cloudflare Worker, or run your own

Rust relay with remote-agents-relay --bind 0.0.0.0:8080

and point agents at

ws://your-host:8080

. Either way the relay only ever sees encrypted frames.

On each host you install the package once and start the agent with the right

tags. For 24/7 database servers you'll want the background service form so the

agent survives reboots:

npm i -g remote-agents

remote-agents run --relay wss://<relay> --room dbfleet --token <secret> \
  --name db-primary --tags db,primary

remote-agents run ... --name db-replica-1 --tags db,replica
remote-agents run ... --name db-replica-2 --tags db,replica

remote-agents run ... --name backup-box --tags backup

remote-agents install --relay wss://<relay> --room dbfleet --token <secret> \
  --name db-primary --tags db,primary

Then confirm the whole fleet is online. Ask your AI:

"List the agents in the room."

Under the hood that calls ** list_agents**, which returns each peer's OS family,

update_available

flag when a newer agentAlso read: Run a security audit across your whole fleet

# Recipe	What it does

Plan-mode look | Tag db hosts, set them to read-only plan , and inspect Postgres safely before touching anything. |
Nightly pg_dump | schedule_add a 0 3 * * * * host-local cron that dumps + gzips and prunes anything older than 14 days. |
Verify freshness | fleet_exec + file_stat to confirm the dump exists, its size, and its mtime on every host at once. |
Replica migration | fleet_git pull migrations + fleet_exec across db,replica with per-host results so a failed node is obvious. |
Lag / health check | fleet_exec target="db,replica" running pg_stat_replication to catch lag and broken streaming. |
Read prod config | read_file a production .env in plan mode — zero write risk. |
Ship a dump | send_file a dump from primary to backup-box over a direct UDP channel, SHA-256 verified. |

Database fleets are where "I'll just SSH in real quick" goes to die. A single

forgotten retention prune fills a disk; a silent pg_dump

failure isn't noticed

until the day you need the dump; a schema migration applied to the primary but

not the replicas causes mysterious read errors hours later. Here's where driving

the fleet through one AI interface pays off:

schedule_add

installs the cron 0 3 * * * *

dump runs even if the relay link is down, your laptop is closed, or the AI session has long ended. The schedule is not tied to your connection.fleet_exec

and file_stat

give you the dump size and mtime on all four boxes in a single round trip instead of four SSH sessions. One failing host does not sink the batch — it just shows up red in the aggregated result.fleet_git

fleet_exec

against the db,replica

tag apply the same migration to both replicas and report each one separately, so a half-applied change is impossible to miss.plan

mode makes a host read-only, so you can pull a prod .env

or run pg_stat_replication

during an incident with no chance of fat-fingering a write.target="os:linux"

hits the Postgres boxes; a Windows SQL Server host joins the same room and answers os:windows

targeting with a sqlcmd

backup instead of pg_dump

.This is not a replacement for a declarative backup orchestrator or a managed

RDS-style service. It shines for self-hosted databases, small-to-mid fleets,

dev/staging tiers, and the operational glue around backups that nobody wants to

maintain in YAML.

Before automating anything, look first and touch nothing. Set every database

host to ** plan** mode — read-only

read_file

, git_status

, and safe exec

set_mode    target="db,replica"   mode=plan
set_mode    agent_id=db-primary   mode=plan

fleet_exec  target="db,primary"   command="systemctl is-active postgresql && psql -tAc 'SELECT version();'"
fleet_exec  target="db,primary"   command="du -sh /var/lib/postgresql/16/main; df -h /var"
list_dir    agent_id=backup-box   path=/srv/backups

Tip:plan

mode still allows a read-onlyexec

, sopsql -tAc 'SELECT ...'

anddu -sh

work fine. The moment a command would write, the agent rejects it.

Start every incident here — you literally cannot break prod fromplan

.

The aggregated reply comes back per host: you'll see db-primary

reporting

active

and a 240 GB data directory, while a replica that's catching up might

report differently. That per-host shape is the whole point — no guessing which

box you're looking at.

This is the core of scheduled pg_dump automation. We install a host-local

cron on db-primary

that dumps the database, gzips it, and prunes anything

older than 14 days. Because schedule_add

writes the cron to the host, it

keeps firing at 03:00 every night whether or not anyone is connected.

Remember the cron is a 6-field spec — sec min hour day month dow

— so

"3 AM nightly" is 0 0 3 * * *

.

edit

so the agent may create the dump file and the wrapper script.

pg_dump -Fc -U postgres app_prod \
  | gzip > /srv/backups/app_prod-$(date +\%F).sql.gz \
  && find /srv/backups -name 'app_prod-*.sql.gz' -mtime +14 -delete
set_mode      agent_id=db-primary  mode=edit

schedule_add  agent_id=db-primary  name=nightly-pgdump \
  cron="0 0 3 * * *" \
  command="pg_dump -Fc -U postgres app_prod | gzip > /srv/backups/app_prod-$(date +%F).sql.gz && find /srv/backups -name 'app_prod-*.sql.gz' -mtime +14 -delete"

schedule_list agent_id=db-primary

The -Fc

custom format gives you a compressed, pg_restore

-friendly archive; a

240 GB cluster typically lands around a 30–45 GB gzipped dump depending on how

much of it is indexes and TOAST. The -mtime +14 -delete

clause keeps roughly

two weeks of dumps and nothing more, so the disk doesn't quietly fill.

Note:for amysqldump cronthe only thing that changes is the command —

mysqldump --single-transaction app_prod | gzip > ...

— wrapped in the exact

sameschedule_add

. And for the Windows SQL Server host you'd schedule a

sqlcmd -Q "BACKUP DATABASE ..."

instead. The scheduler, retention, and

verification flow are identical across all of them.

To remove the schedule later (say you've migrated to WAL archiving):

schedule_remove  agent_id=db-primary  name=nightly-pgdump

A backup you never check is a backup you don't have. This recipe answers the only

question that matters at 9 AM: did last night's dump actually run, and is it the right size, on every host? We combine

fleet_exec

to find the newest dump withfile_stat

to read its exact size and mtime.file_stat

that file for an authoritative size and modification time.

fleet_exec  target="db,primary"  \
  command="ls -t /srv/backups/app_prod-*.sql.gz 2>/dev/null | head -n1"

file_stat   agent_id=db-primary  path=/srv/backups/app_prod-2026-06-18.sql.gz

fleet_exec  target="db,primary"  \
  command="find /srv/backups -name 'app_prod-*.sql.gz' -mmin -1560 -printf '%p %s bytes\n' || echo NO_FRESH_BACKUP"

file_stat

returns the size and mtime directly, which is more trustworthy than

parsing ls

output. The -mmin -1560

window (26 hours) gives the 03:00 job some

slack — anything that didn't write a fresh file in that window prints

NO_FRESH_BACKUP

, and because results are aggregated per host, a replica

that skipped its dump stands out instantly while the healthy boxes report a clean

~42 GB

file.

Important:size is your early-warning signal. A dump that suddenly drops

from 42 GB to 200 KB almost always meanspg_dump

errored out mid-run (bad

credentials, a dropped connection, a full disk) and gzipped only the error.

Watching size over time catches silent failures that a green exit code hides.

You can fold this into a once-a-day self-check by wrapping the find

in another

schedule_add

that appends to a log — same host-local cron pattern as Recipe 2.

Also read: Build a cross-platform CI test farm

Replicas in database fleet management drift the moment a migration lands on

one box but not another. With fleet_git

to pull the migration repo and

fleet_exec

to apply it across the db,replica

tag, both replicas move in

lockstep and you get a per-host verdict.

On true streaming replicas you don't run DDL directly (they're read-only and

replay WAL from the primary). This recipe fits the common real-world setup where

"replica" hosts also run migration tooling against a logical target, or where you

roll out an out-of-band maintenance script. Adjust to your topology.

fleet_git   target="db,replica"  op=pull  repo=/srv/migrations  remote=origin  branch=main

fleet_exec  target="db,replica"  \
  command="cd /srv/migrations && ./run_migrations.sh 2>&1 | tail -n 5 || echo MIGRATION_FAILED"

fleet_exec  target="db,replica"  \
  command="psql -tAc 'SELECT max(version) FROM schema_migrations;'"

The payoff is the per-host aggregation: if db-replica-2

returns

MIGRATION_FAILED

while db-replica-1

reports the new version, you know exactly

which node to fix — no scrolling through interleaved SSH output trying to figure

out whose error you're reading. One failing replica does not block the batch from

finishing on the healthy one.

Tip:put the hosts inedit

mode (notbypass

) for migrations. Inedit

,

the agent auto-creates abackup before any overwrite, so a botched

write_file

to a config or script leaves you the original. Drop back toplan

the moment you're done.

The single most useful replica maintenance check is replication lag: how far

behind the primary each replica is. From the primary you query

pg_stat_replication

; from each replica you check pg_last_wal_replay_lag

.

Both are one fleet_exec

away.

fleet_exec  target="db,primary"  \
  command="psql -xtAc \"SELECT client_addr, state, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;\""

fleet_exec  target="db,replica"  \
  command="psql -tAc 'SELECT now() - pg_last_xact_replay_timestamp() AS replay_lag;'"

fleet_exec  target="db,replica"  \
  command="psql -tAc 'SELECT CASE WHEN pg_is_in_recovery() THEN 1 ELSE 0 END;' | grep -q 1 && echo OK_RECOVERY || echo NOT_A_REPLICA"

Run from plan

mode — these are all read-only queries, so there's zero risk even

on a busy primary. The first query, sent to db-primary

, shows you both replicas

as rows with their state

(streaming

is what you want) and lag_bytes

. The

second, fanned out across the db,replica

tag, returns each replica's replay lag

side by side. A replica that has fallen out of streaming shows up either as a

missing row in the primary's view or a ballooning replay_lag

.

Note:a healthy LAN replica usually sits under a few hundred milliseconds

of replay lag. A sudden jump to tens of seconds (or a replica vanishing from

pg_stat_replication

entirely) means streaming broke — often a WAL retention

or network issue. Catching it here, across both replicas in one call, beats

discovering it when a read query returns stale data.

Sometimes you just need to confirm a connection string or a backup target path

without any chance of editing it. With the host in ** plan** mode,

read_file

plan

mode.read_file

the config.exec

.

set_mode   agent_id=db-primary  mode=plan

read_file  agent_id=db-primary  path=/srv/app/.env

exec       agent_id=db-primary  command="grep -E '^(PGHOST|PGDATABASE|BACKUP_DIR)=' /srv/app/.env"

Because the agent is read-only, even if you (or the AI) followed up with a

write_file

, it would be rejected. On top of that, a hard deny-list —

covering paths like /etc/shadow

and /boot

— applies on every host in

every mode, including bypass

, so the truly sensitive system files are never

readable or writable through the agent regardless of how you set things up.

Important:plan

mode is the right default for production database hosts.

Keep them inplan

day to day and only flip a single host toedit

for the

specific window you're changing something — then flip it straight back. It

turns "I'll be careful" into "I literally can't write right now."

Finally, get the dump off the primary and onto the offsite backup-box

.

send_file

streams a file host→host over a direct UDP data channel (opened on

demand, with automatic relay fallback) and verifies it end-to-end with a

SHA-256 checksum. Because the receiving side performs a write, the

destination host needs edit

or bypass

.

backup-box

in edit

so it can write the incoming file.send_file

the latest dump from db-primary

to backup-box

.file_stat

on the receiver.

set_mode     agent_id=backup-box  mode=edit

send_file    agent_id=db-primary  \
  path=/srv/backups/app_prod-2026-06-18.sql.gz \
  to=backup-box \
  dest=/srv/offsite/app_prod-2026-06-18.sql.gz

file_stat    agent_id=backup-box  path=/srv/offsite/app_prod-2026-06-18.sql.gz

The transfer goes peer-to-peer where the network allows, so a 42 GB dump doesn't

have to round-trip through the relay — and if a direct UDP path can't be

established, it transparently falls back to the relay so the copy still

completes. The SHA-256 check means you find out about a corrupted transfer

immediately, not when a restore fails three weeks later. If you'd rather pull

from the receiving side, transfer_get

is the mirror-image tool.

Tip:in the browser fleet-chat panel there's a Files view that does the same

move with a live progress bar, plus a binary-safe chunked download of any file

through the relay. Handy when you want to eyeball a dump's size or grab one to

your laptop without dropping to the CLI.

Set backup-box

back to plan

when the copy is done, so the offsite store is

read-only at rest:

set_mode  agent_id=backup-box  mode=plan

It's the same idea, but the cron is installed and managed through one AI

interface across the whole fleet, and schedule_add

registers it on the host

so it survives relay outages and reboots. The difference is operational: you add,

list, verify, and remove backup schedules on four boxes from one place, and you

get per-host confirmation that each dump actually ran, instead of trusting four

independent crontabs you never look at.

Yes. schedule_add

writes a host-local cron that lives entirely on the database

server. The relay link and your AI session are only needed to create, list, or remove the schedule — not to run it. Your

0 0 3 * * *

dump fires at 03:00Start every host in plan

mode, which is read-only — you can run

pg_stat_replication

, read_file

a .env

, and check dump freshness with zero

write risk. Flip a single host to edit

only for the specific change you're

making (edit

auto-backs-up before overwriting), and a hard deny-list on paths

like /etc/shadow

and /boot

applies even in bypass

. All command payloads and

results are end-to-end encrypted; the relay forwards ciphertext and never sees

your data or keys.

Yes — the workflow is database-agnostic. Swap the command inside schedule_add

:

mysqldump --single-transaction ... | gzip

for MySQL, or

sqlcmd -Q "BACKUP DATABASE ..."

on a Windows host targeted via os:windows

.

The scheduling, 14-day retention prune, freshness verification, and SHA-256

transfer all stay identical.

Run ** fleet_update_check**, which tells you which idle hosts have a newer agent

npm i -g remote-agents@latest

on those hosts. ItOne MCP interface, five hosts, zero manual SSH sessions: a nightly scheduled pg_dump with 14-day retention runs host-local on the primary, freshness is

file_stat

, replicas get migrations and lag checksplan

mode, and the dump lands offsiteInstall:

npm i -g remote-agents

→

[package on npm]·

[source and docs]

source & further reading

dev.to — original article Cracking WMI-exec in Rust by turning impacket into a byte-level oracle Productionizing an MCP-Based AI Agent with Docker, Kubernetes, CI/CD, and Observability Beyond Single-Agent Loops: How We Built Multi-Agent Orchestration in Octo

Automate Database Backups Across a Server Fleet with AI: 7 Recipes for 2026

Run your AI side-project on zahid.host