Recoverable failures for AI coding agents

wpnews.pro

cd /news/ai-agents/recoverable-failures-for-ai-coding-a… · home › topics › ai-agents › article

[ARTICLE · art-37858] src=gist.github.com ↗ pub=2026-06-19T12:33Z topic=ai-agents verified=true sentiment=· neutral

Recoverable failures for AI coding agents

A developer proposes using Btrfs snapshots to make AI coding agent failures recoverable by treating agent work as filesystem transactions. The setup involves creating cheap snapshots before agent runs, allowing inspection and rollback of changes. This approach complements Git and trash-backed rm to protect against accidental deletes, overwrites, and generated damage.

read4 min views16 publishedJun 19, 2026

AI coding agents are useful precisely because they can run tools, edit many files, execute tests, install dependencies, and iterate quickly. That same ability makes them risky in YOLO mode: a mistaken command, broad glob, broken script, or overconfident refactor can damage a working tree faster than a human can react.

The goal is not to make agents harmless. The goal is to make common failures recoverable.

The proposed agentic setup has three layers:

Git commits       protect intentional source history
trash-backed rm   protects ordinary accidental deletes
Btrfs snapshots   protect deletes, overwrites, generated damage, and bad runs

These layers cover different failure modes. Git is excellent for source history, but it does not protect ignored files, untracked generated state, local config, or the repository metadata itself. Trash-backed rm

helps with deletion, but not with overwrites. Btrfs snapshots cover the whole subvolume state at a point in time.

This post focuses on the Btrfs snapshot layer: making bad AI-agent runs recoverable as filesystem transactions. The trash-backed rm

layer is a separate defense for accidental deletion; see Safe rm defaults for agent-heavy Linux machines.

Treat agent work as a controlled filesystem transaction:

create a cheap snapshot
let the agent work
inspect the result
keep it, diff it, or roll it back

This is the same basic idea behind several AI-agent sandbox approaches: give the agent real tools, but run those tools in a filesystem layer that can be inspected or discarded.

Examples of related work and discussion:

https://perevillega.com/posts/2026-03-03-ai-sandbox-coding-agents/https://github.com/mauro3/sandkasten https://www.agentfs.ai/https://news.ycombinator.com/item?id=47550282 https://dev.to/alanwest/sandboxing-ai-agent-filesystems-containers-vs-virtual-fs-layers-ffe

The machine uses:

LVM logical volume
  btrfs filesystem (subvolid=5, flat layout)
    ext2_saved        ← btrfs-convert artifact, can be deleted once stable
    @agent_workflow

@agent_workflow

is the important part. It is a separate Btrfs subvolume mounted at:

/home/martin/bin/lib/agent_workflow

Keeping agent_workflow

as its own subvolume means it can be snapshotted and rolled back independently from the rest of $HOME

Verify the mount exactly, not just the nearest parent mount:

findmnt -rn -M /home/martin/bin/lib/agent_workflow
sudo btrfs subvolume show /home/martin/bin/lib/agent_workflow

This matters because findmnt --target

can return /

when the directory is not actually a mount point. The protected directory should show btrfs

, and btrfs subvolume show

should succeed.

We use Snapper on top of Btrfs:

sudo apt install btrfs-progs snapper
sudo snapper -c agent_workflow create-config /home/martin/bin/lib/agent_workflow
sudo chown martin:martin /home/martin/bin/lib/agent_workflow

Do not recursively chown

the whole subvolume after creating the Snapper configuration. Snapper keeps its metadata in .snapshots

, and that directory must remain owned by root. Changing the owner of .snapshots

makes snapshot creation fail with:

IO Error (.snapshots must have owner root).

Before an agent run:

PRE=$(sudo snapper -c agent_workflow create --print-number --description "before yolo agent run")

After a useful result:

POST=$(sudo snapper -c agent_workflow create --print-number --description "after successful agent run")

Inspect:

sudo snapper -c agent_workflow list
sudo snapper -c agent_workflow status PRE..POST
sudo snapper -c agent_workflow diff PRE..POST

If the current run is bad and no post-run snapshot was created, compare or undo against the live filesystem as snapshot 0

sudo snapper -c agent_workflow status "$PRE..0"
sudo snapper -c agent_workflow diff "$PRE..0"
sudo snapper -c agent_workflow undochange "$PRE..0"

If a post-run snapshot was created and the live filesystem still matches it, PRE..POST

is also usable:

sudo snapper -c agent_workflow undochange "$PRE..$POST"

In testing, undochange

restored deleted files, reverted overwritten files, and removed newly created files.

tools/agent-run

does the following:

verify it is running inside the protected agent_workflow

subvolume - create a Snapper snapshot

print the snapshot id
run the agent command
print the compare and rollback commands

The CLI refuses to run if the snapshot cannot be created. That matters: the safety mechanism has to be automatic, because YOLO mode is exactly when humans are least likely to remember manual precautions.

The mount check uses findmnt -rn -T "$PWD"

against the nearest mount, then asserts that the target is /home/martin/bin/lib/agent_workflow

and the filesystem type is btrfs

Example:

cd /home/martin/bin/lib/agent_workflow
agent-run claude --dangerously-skip-permissions

On a bad run, roll back with the commands printed at exit:

sudo snapper -c agent_workflow undochange 3..0

Remaining risks:

network exfiltration
writes outside the protected subvolume
credential access
destructive commands run with elevated privileges
snapshot deletion by a process with enough permission

URL: https://gist.github.com/monperrus/a7aa344dc84c76e5ec569a646b31eab9

source & further reading

gist.github.com — original article OpenCode AI config to deny read access to .env, node_modules, build artifacts, cache dirs and ask before bash execution Download CapCut Pro 2026 for Mac For Agentic Coding

~/api · this article 200

$curl api.wpnews.pro/v1/news/recoverable-failures-for…

Read original on gist.github.com → gist.github.com/monperrus/a7aa344dc84c76e5ec569a…

mentioned entities

Btrfs

Snapper

Git

LVM

Linux

metadata

slugrecoverable-failures-for-ai-coding-agents

topic#ai-agents

secondary3 topics

sentimentneutral

canonicalgist.github.com

navigation

← prevAuction of Charles Shaw’s former…

next →The ultimate Wine Country overni…

── more in #ai-agents 4 stories · sorted by recency

dev.to · 4 Jul · #ai-agents

The MCP attack your code review cannot see

dev.to · 4 Jul · #ai-agents

Why 88% of Agent Pilots Die: The Infrastructure Readiness Gap Nobody Talks About

dev.to · 4 Jul · #ai-agents

Your AI agent is the most over-privileged account you own

dev.to · 4 Jul · #ai-agents

purefetch: a fastfetch-style system info tool in Rust with zero dependencies

── more on @btrfs 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 4 Jul · #large-language-models

Claude Sonnet 5: What Developers Need to Know Before Migrating

wpnews · 4 Jul · #artificial-intelligence

Istota, a personal AI operating system

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required