{"slug": "automate-database-backups-across-a-server-fleet-with-ai-7-recipes-for-2026", "title": "Automate Database Backups Across a Server Fleet with AI: 7 Recipes for 2026", "summary": "A developer automated database backups across a fleet of servers using the remote-agents MCP server, which connects each host to an encrypted relay room. The setup includes a Postgres primary, two streaming replicas, and an offsite backup box, all controlled via a single AI interface like Claude or opencode. The system uses end-to-end encryption (AES-GCM-256) and supports cross-platform hosts including Windows SQL Server.", "body_md": "If you are still SSH-ing into four boxes to confirm last night's dump actually\n\nran, **automated database backups across servers** are exactly the kind of chore\n\nthat should be running on autopilot — not eating your evenings. In this guide we\n\nwire up **postgres backup automation** for a small fleet (one primary, two\n\nstreaming replicas, an offsite backup box) and drive the whole thing from a\n\nsingle AI interface using the MCP server\n\n[ remote-agents](https://www.npmjs.com/package/remote-agents). No agent\n\nIn short:every database host runs a lightweight agent connected to an\n\nencrypted relay room. Your AI assistant (Claude or opencode) sees the whole\n\nfleet as one machine and calls tools like`schedule_add`\n\n,`fleet_exec`\n\n,\n\n`file_stat`\n\n,`fleet_git`\n\n,`send_file`\n\n, and`set_mode`\n\n. Cron jobs liveon the, so a nightly\n\nhost`pg_dump`\n\nkeeps running even if the relay link drops.\n\nPayloads are end-to-end encrypted (AES-GCM-256) — the relay forwards\n\nciphertext blind.\n\nLet's take a realistic small-SaaS database tier: a Postgres primary, two\n\nstreaming replicas for read scaling and failover, and an offsite box that only\n\nstores compressed dumps. We'll also nod to a Windows SQL Server host at the end\n\nto show the cross-platform story.\n\n| Host | Role | Tag | Stack |\n|---|---|---|---|\n`db-primary` |\nPostgres primary (writes) | `db,primary` |\nPostgreSQL 16, Linux |\n`db-replica-1` |\nStreaming replica | `db,replica` |\nPostgreSQL 16, Linux |\n`db-replica-2` |\nStreaming replica | `db,replica` |\nPostgreSQL 16, Linux |\n`backup-box` |\nOffsite dump target | `backup` |\nrsync, gzip, Linux |\n`sql-win` |\nSQL Server (optional) | `db,windows` |\nSQL Server 2022, Windows |\n\nAll agents join one relay room (say `dbfleet`\n\n). Tags let you address a **group**\n\nin a single call: `target=\"db,replica\"`\n\nhits both replicas, `target=\"os:linux\"`\n\nhits every Linux box, and `target=\"all\"`\n\nsweeps the whole fleet. Note that\n\nmulti-tag targets match a host carrying **either** tag, so `db,primary`\n\nresolves\n\nto every database host.\n\n```\n            ┌────────────── AI (Claude / opencode) ──────────────┐\n            │            remote-agents (MCP, stdio)               │\n            └───────────────────────┬────────────────────────────┘\n                                    │ wss:// (E2E-encrypted)\n                            ┌───────┴────────┐  relay (CF Worker or self-hosted)\n                            │   room=dbfleet  │\n        ┌───────────┬───────┴────┬────────────┬─────────────┐\n   db-primary   db-replica-1  db-replica-2  backup-box   sql-win\n  (db,primary)  (db,replica)  (db,replica)  (backup)    (db,windows)\n```\n\nThe relay is interchangeable: use the hosted Cloudflare Worker, or run your own\n\nRust relay with `remote-agents-relay --bind 0.0.0.0:8080`\n\nand point agents at\n\n`ws://your-host:8080`\n\n. Either way the relay only ever sees encrypted frames.\n\nOn each host you install the package once and start the agent with the right\n\ntags. For 24/7 database servers you'll want the background service form so the\n\nagent survives reboots:\n\n```\n# once on every machine\nnpm i -g remote-agents\n\n# the Postgres primary\nremote-agents run --relay wss://<relay> --room dbfleet --token <secret> \\\n  --name db-primary --tags db,primary\n\n# the two replicas\nremote-agents run ... --name db-replica-1 --tags db,replica\nremote-agents run ... --name db-replica-2 --tags db,replica\n\n# the offsite backup target\nremote-agents run ... --name backup-box --tags backup\n\n# install as a systemd service for 24/7 hosts (instead of `run`)\nremote-agents install --relay wss://<relay> --room dbfleet --token <secret> \\\n  --name db-primary --tags db,primary\n```\n\nThen confirm the whole fleet is online. Ask your AI:\n\n\"List the agents in the room.\"\n\nUnder the hood that calls ** list_agents**, which returns each peer's OS family,\n\n`update_available`\n\nflag when a newer agent*Also read: Run a security audit across your whole fleet*\n\n| # Recipe | What it does |\n|---|---|\n1. Plan-mode look |\nTag db hosts, set them to read-only `plan` , and inspect Postgres safely before touching anything. |\n2. Nightly pg_dump |\n`schedule_add` a `0 3 * * * *` host-local cron that dumps + gzips and prunes anything older than 14 days. |\n3. Verify freshness |\n`fleet_exec` + `file_stat` to confirm the dump exists, its size, and its mtime on every host at once. |\n4. Replica migration |\n`fleet_git` pull migrations + `fleet_exec` across `db,replica` with per-host results so a failed node is obvious. |\n5. Lag / health check |\n`fleet_exec target=\"db,replica\"` running `pg_stat_replication` to catch lag and broken streaming. |\n6. Read prod config |\n`read_file` a production `.env` in `plan` mode — zero write risk. |\n7. Ship a dump |\n`send_file` a dump from primary to `backup-box` over a direct UDP channel, SHA-256 verified. |\n\nDatabase fleets are where \"I'll just SSH in real quick\" goes to die. A single\n\nforgotten retention prune fills a disk; a silent `pg_dump`\n\nfailure isn't noticed\n\nuntil the day you need the dump; a schema migration applied to the primary but\n\nnot the replicas causes mysterious read errors hours later. Here's where driving\n\nthe fleet through one AI interface pays off:\n\n`schedule_add`\n\ninstalls the cron `0 3 * * * *`\n\ndump runs even if the relay link\nis down, your laptop is closed, or the AI session has long ended. The schedule\nis not tied to your connection.`fleet_exec`\n\nand `file_stat`\n\ngive you the dump\nsize and mtime on all four boxes in a single round trip instead of four SSH\nsessions. One failing host does not sink the batch — it just shows up red in\nthe aggregated result.`fleet_git`\n\n+ `fleet_exec`\n\nagainst the\n`db,replica`\n\ntag apply the same migration to both replicas and report each one\nseparately, so a half-applied change is impossible to miss.`plan`\n\nmode makes a host read-only, so you can pull\na prod `.env`\n\nor run `pg_stat_replication`\n\nduring an incident with no chance of\nfat-fingering a write.`target=\"os:linux\"`\n\nhits the Postgres boxes; a\nWindows SQL Server host joins the same room and answers `os:windows`\n\ntargeting\nwith a `sqlcmd`\n\nbackup instead of `pg_dump`\n\n.This is not a replacement for a declarative backup orchestrator or a managed\n\nRDS-style service. It shines for self-hosted databases, small-to-mid fleets,\n\ndev/staging tiers, and the operational glue around backups that nobody wants to\n\nmaintain in YAML.\n\nBefore automating anything, look first and touch nothing. Set every database\n\nhost to ** plan** mode — read-only\n\n`read_file`\n\n, `git_status`\n\n, and safe `exec`\n\n```\nset_mode    target=\"db,replica\"   mode=plan\nset_mode    agent_id=db-primary   mode=plan\n\nfleet_exec  target=\"db,primary\"   command=\"systemctl is-active postgresql && psql -tAc 'SELECT version();'\"\nfleet_exec  target=\"db,primary\"   command=\"du -sh /var/lib/postgresql/16/main; df -h /var\"\nlist_dir    agent_id=backup-box   path=/srv/backups\n```\n\nTip:`plan`\n\nmode still allows a read-only`exec`\n\n, so`psql -tAc 'SELECT ...'`\n\nand`du -sh`\n\nwork fine. The moment a command would write, the agent rejects it.\n\nStart every incident here — you literally cannot break prod from`plan`\n\n.\n\nThe aggregated reply comes back **per host**: you'll see `db-primary`\n\nreporting\n\n`active`\n\nand a 240 GB data directory, while a replica that's catching up might\n\nreport differently. That per-host shape is the whole point — no guessing which\n\nbox you're looking at.\n\nThis is the core of **scheduled pg_dump** automation. We install a host-local\n\ncron on `db-primary`\n\nthat dumps the database, gzips it, and prunes anything\n\nolder than 14 days. Because `schedule_add`\n\nwrites the cron **to the host**, it\n\nkeeps firing at 03:00 every night whether or not anyone is connected.\n\nRemember the cron is a **6-field** spec — `sec min hour day month dow`\n\n— so\n\n\"3 AM nightly\" is `0 0 3 * * *`\n\n.\n\n`edit`\n\nso the agent may create the dump file and the\nwrapper script.\n\n```\n# the command the cron runs on db-primary (one line, gzip + 14-day prune)\npg_dump -Fc -U postgres app_prod \\\n  | gzip > /srv/backups/app_prod-$(date +\\%F).sql.gz \\\n  && find /srv/backups -name 'app_prod-*.sql.gz' -mtime +14 -delete\nset_mode      agent_id=db-primary  mode=edit\n\nschedule_add  agent_id=db-primary  name=nightly-pgdump \\\n  cron=\"0 0 3 * * *\" \\\n  command=\"pg_dump -Fc -U postgres app_prod | gzip > /srv/backups/app_prod-$(date +%F).sql.gz && find /srv/backups -name 'app_prod-*.sql.gz' -mtime +14 -delete\"\n\nschedule_list agent_id=db-primary\n```\n\nThe `-Fc`\n\ncustom format gives you a compressed, `pg_restore`\n\n-friendly archive; a\n\n240 GB cluster typically lands around a 30–45 GB gzipped dump depending on how\n\nmuch of it is indexes and TOAST. The `-mtime +14 -delete`\n\nclause keeps roughly\n\ntwo weeks of dumps and nothing more, so the disk doesn't quietly fill.\n\nNote:for amysqldump cronthe only thing that changes is the command —\n\n`mysqldump --single-transaction app_prod | gzip > ...`\n\n— wrapped in the exact\n\nsame`schedule_add`\n\n. And for the Windows SQL Server host you'd schedule a\n\n`sqlcmd -Q \"BACKUP DATABASE ...\"`\n\ninstead. The scheduler, retention, and\n\nverification flow are identical across all of them.\n\nTo remove the schedule later (say you've migrated to WAL archiving):\n\n```\nschedule_remove  agent_id=db-primary  name=nightly-pgdump\n```\n\nA backup you never check is a backup you don't have. This recipe answers the only\n\nquestion that matters at 9 AM: **did last night's dump actually run, and is it the\nright size, on every host?** We combine\n\n`fleet_exec`\n\nto find the newest dump with`file_stat`\n\nto read its exact size and mtime.`file_stat`\n\nthat file for an authoritative size and modification time.\n\n```\nfleet_exec  target=\"db,primary\"  \\\n  command=\"ls -t /srv/backups/app_prod-*.sql.gz 2>/dev/null | head -n1\"\n\nfile_stat   agent_id=db-primary  path=/srv/backups/app_prod-2026-06-18.sql.gz\n\nfleet_exec  target=\"db,primary\"  \\\n  command=\"find /srv/backups -name 'app_prod-*.sql.gz' -mmin -1560 -printf '%p %s bytes\\n' || echo NO_FRESH_BACKUP\"\n```\n\n`file_stat`\n\nreturns the size and mtime directly, which is more trustworthy than\n\nparsing `ls`\n\noutput. The `-mmin -1560`\n\nwindow (26 hours) gives the 03:00 job some\n\nslack — anything that didn't write a fresh file in that window prints\n\n`NO_FRESH_BACKUP`\n\n, and because results are aggregated **per host**, a replica\n\nthat skipped its dump stands out instantly while the healthy boxes report a clean\n\n`~42 GB`\n\nfile.\n\nImportant:size is your early-warning signal. A dump that suddenly drops\n\nfrom 42 GB to 200 KB almost always means`pg_dump`\n\nerrored out mid-run (bad\n\ncredentials, a dropped connection, a full disk) and gzipped only the error.\n\nWatching size over time catches silent failures that a green exit code hides.\n\nYou can fold this into a once-a-day self-check by wrapping the `find`\n\nin another\n\n`schedule_add`\n\nthat appends to a log — same host-local cron pattern as Recipe 2.\n\n*Also read: Build a cross-platform CI test farm*\n\nReplicas in **database fleet management** drift the moment a migration lands on\n\none box but not another. With `fleet_git`\n\nto pull the migration repo and\n\n`fleet_exec`\n\nto apply it across the `db,replica`\n\ntag, both replicas move in\n\nlockstep and you get a per-host verdict.\n\nOn true streaming replicas you don't run DDL directly (they're read-only and\n\nreplay WAL from the primary). This recipe fits the common real-world setup where\n\n\"replica\" hosts also run migration tooling against a logical target, or where you\n\nroll out an out-of-band maintenance script. Adjust to your topology.\n\n```\nfleet_git   target=\"db,replica\"  op=pull  repo=/srv/migrations  remote=origin  branch=main\n\nfleet_exec  target=\"db,replica\"  \\\n  command=\"cd /srv/migrations && ./run_migrations.sh 2>&1 | tail -n 5 || echo MIGRATION_FAILED\"\n\nfleet_exec  target=\"db,replica\"  \\\n  command=\"psql -tAc 'SELECT max(version) FROM schema_migrations;'\"\n```\n\nThe payoff is the **per-host aggregation**: if `db-replica-2`\n\nreturns\n\n`MIGRATION_FAILED`\n\nwhile `db-replica-1`\n\nreports the new version, you know exactly\n\nwhich node to fix — no scrolling through interleaved SSH output trying to figure\n\nout whose error you're reading. One failing replica does not block the batch from\n\nfinishing on the healthy one.\n\nTip:put the hosts in`edit`\n\nmode (not`bypass`\n\n) for migrations. In`edit`\n\n,\n\nthe agent auto-creates abackup before any overwrite, so a botched\n\n`write_file`\n\nto a config or script leaves you the original. Drop back to`plan`\n\nthe moment you're done.\n\nThe single most useful **replica maintenance** check is replication lag: how far\n\nbehind the primary each replica is. From the primary you query\n\n`pg_stat_replication`\n\n; from each replica you check `pg_last_wal_replay_lag`\n\n.\n\nBoth are one `fleet_exec`\n\naway.\n\n```\nfleet_exec  target=\"db,primary\"  \\\n  command=\"psql -xtAc \\\"SELECT client_addr, state, pg_wal_lsn_diff(sent_lsn, replay_lsn) AS lag_bytes FROM pg_stat_replication;\\\"\"\n\nfleet_exec  target=\"db,replica\"  \\\n  command=\"psql -tAc 'SELECT now() - pg_last_xact_replay_timestamp() AS replay_lag;'\"\n\nfleet_exec  target=\"db,replica\"  \\\n  command=\"psql -tAc 'SELECT CASE WHEN pg_is_in_recovery() THEN 1 ELSE 0 END;' | grep -q 1 && echo OK_RECOVERY || echo NOT_A_REPLICA\"\n```\n\nRun from `plan`\n\nmode — these are all read-only queries, so there's zero risk even\n\non a busy primary. The first query, sent to `db-primary`\n\n, shows you both replicas\n\nas rows with their `state`\n\n(`streaming`\n\nis what you want) and `lag_bytes`\n\n. The\n\nsecond, fanned out across the `db,replica`\n\ntag, returns each replica's replay lag\n\nside by side. A replica that has fallen out of streaming shows up either as a\n\nmissing row in the primary's view or a ballooning `replay_lag`\n\n.\n\nNote:a healthy LAN replica usually sits under a few hundred milliseconds\n\nof replay lag. A sudden jump to tens of seconds (or a replica vanishing from\n\n`pg_stat_replication`\n\nentirely) means streaming broke — often a WAL retention\n\nor network issue. Catching it here, across both replicas in one call, beats\n\ndiscovering it when a read query returns stale data.\n\nSometimes you just need to confirm a connection string or a backup target path\n\nwithout any chance of editing it. With the host in ** plan** mode,\n\n`read_file`\n\n`plan`\n\nmode.`read_file`\n\nthe config.`exec`\n\n.\n\n```\nset_mode   agent_id=db-primary  mode=plan\n\nread_file  agent_id=db-primary  path=/srv/app/.env\n\nexec       agent_id=db-primary  command=\"grep -E '^(PGHOST|PGDATABASE|BACKUP_DIR)=' /srv/app/.env\"\n```\n\nBecause the agent is read-only, even if you (or the AI) followed up with a\n\n`write_file`\n\n, it would be rejected. On top of that, a hard **deny-list** —\n\ncovering paths like `/etc/shadow`\n\nand `/boot`\n\n— applies on **every** host in\n\n**every** mode, including `bypass`\n\n, so the truly sensitive system files are never\n\nreadable or writable through the agent regardless of how you set things up.\n\nImportant:`plan`\n\nmode is the right default for production database hosts.\n\nKeep them in`plan`\n\nday to day and only flip a single host to`edit`\n\nfor the\n\nspecific window you're changing something — then flip it straight back. It\n\nturns \"I'll be careful\" into \"I literally can't write right now.\"\n\nFinally, get the dump off the primary and onto the offsite `backup-box`\n\n.\n\n`send_file`\n\nstreams a file **host→host over a direct UDP data channel** (opened on\n\ndemand, with automatic relay fallback) and verifies it end-to-end with a\n\n**SHA-256** checksum. Because the receiving side performs a write, the\n\ndestination host needs `edit`\n\nor `bypass`\n\n.\n\n`backup-box`\n\nin `edit`\n\nso it can write the incoming file.`send_file`\n\nthe latest dump from `db-primary`\n\nto `backup-box`\n\n.`file_stat`\n\non the receiver.\n\n```\nset_mode     agent_id=backup-box  mode=edit\n\nsend_file    agent_id=db-primary  \\\n  path=/srv/backups/app_prod-2026-06-18.sql.gz \\\n  to=backup-box \\\n  dest=/srv/offsite/app_prod-2026-06-18.sql.gz\n\nfile_stat    agent_id=backup-box  path=/srv/offsite/app_prod-2026-06-18.sql.gz\n```\n\nThe transfer goes peer-to-peer where the network allows, so a 42 GB dump doesn't\n\nhave to round-trip through the relay — and if a direct UDP path can't be\n\nestablished, it transparently falls back to the relay so the copy still\n\ncompletes. The SHA-256 check means you find out about a corrupted transfer\n\nimmediately, not when a restore fails three weeks later. If you'd rather pull\n\nfrom the receiving side, `transfer_get`\n\nis the mirror-image tool.\n\nTip:in the browser fleet-chat panel there's a Files view that does the same\n\nmove with a live progress bar, plus a binary-safe chunked download of any file\n\nthrough the relay. Handy when you want to eyeball a dump's size or grab one to\n\nyour laptop without dropping to the CLI.\n\nSet `backup-box`\n\nback to `plan`\n\nwhen the copy is done, so the offsite store is\n\nread-only at rest:\n\n```\nset_mode  agent_id=backup-box  mode=plan\n```\n\nIt's the same idea, but the cron is installed and managed through one AI\n\ninterface across the whole fleet, and `schedule_add`\n\nregisters it **on the host**\n\nso it survives relay outages and reboots. The difference is operational: you add,\n\nlist, verify, and remove backup schedules on four boxes from one place, and you\n\nget per-host confirmation that each dump actually ran, instead of trusting four\n\nindependent crontabs you never look at.\n\nYes. `schedule_add`\n\nwrites a host-local cron that lives entirely on the database\n\nserver. The relay link and your AI session are only needed to **create, list, or\nremove** the schedule — not to run it. Your\n\n`0 0 3 * * *`\n\ndump fires at 03:00Start every host in `plan`\n\nmode, which is read-only — you can run\n\n`pg_stat_replication`\n\n, `read_file`\n\na `.env`\n\n, and check dump freshness with zero\n\nwrite risk. Flip a single host to `edit`\n\nonly for the specific change you're\n\nmaking (`edit`\n\nauto-backs-up before overwriting), and a hard deny-list on paths\n\nlike `/etc/shadow`\n\nand `/boot`\n\napplies even in `bypass`\n\n. All command payloads and\n\nresults are end-to-end encrypted; the relay forwards ciphertext and never sees\n\nyour data or keys.\n\nYes — the workflow is database-agnostic. Swap the command inside `schedule_add`\n\n:\n\n`mysqldump --single-transaction ... | gzip`\n\nfor MySQL, or\n\n`sqlcmd -Q \"BACKUP DATABASE ...\"`\n\non a Windows host targeted via `os:windows`\n\n.\n\nThe scheduling, 14-day retention prune, freshness verification, and SHA-256\n\ntransfer all stay identical.\n\nRun ** fleet_update_check**, which tells you which idle hosts have a newer agent\n\n`npm i -g remote-agents@latest`\n\non those hosts. ItOne MCP interface, five hosts, zero manual SSH sessions: a nightly **scheduled\npg_dump** with 14-day retention runs host-local on the primary, freshness is\n\n`file_stat`\n\n, replicas get migrations and lag checks`plan`\n\nmode, and the dump lands offsiteInstall:\n\n`npm i -g remote-agents`\n\n→\n\n[package on npm]·\n\n[source and docs]", "url": "https://wpnews.pro/news/automate-database-backups-across-a-server-fleet-with-ai-7-recipes-for-2026", "canonical_source": "https://dev.to/bitwiserokos/automate-database-backups-across-a-server-fleet-with-ai-7-recipes-for-2026-395n", "published_at": "2026-06-18 14:13:35+00:00", "updated_at": "2026-06-18 14:21:17.910951+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "ai-agents", "large-language-models", "ai-infrastructure"], "entities": ["remote-agents", "PostgreSQL", "SQL Server", "Claude", "opencode", "Cloudflare", "AES-GCM-256", "MCP"], "alternates": {"html": "https://wpnews.pro/news/automate-database-backups-across-a-server-fleet-with-ai-7-recipes-for-2026", "markdown": "https://wpnews.pro/news/automate-database-backups-across-a-server-fleet-with-ai-7-recipes-for-2026.md", "text": "https://wpnews.pro/news/automate-database-backups-across-a-server-fleet-with-ai-7-recipes-for-2026.txt", "jsonld": "https://wpnews.pro/news/automate-database-backups-across-a-server-fleet-with-ai-7-recipes-for-2026.jsonld"}}