cd /news/ai-safety/how-to-build-ai-agents-that-don-t-de… · home topics ai-safety article
[ARTICLE · art-29217] src=dev.to ↗ pub= topic=ai-safety verified=true sentiment=· neutral

How to Build AI Agents That Don't Delete Your Database

A developer outlines a three-layer safety framework for AI agents that interact with databases, emphasizing structural safeguards over simple prompts. The approach includes action boundaries, pre-execution validation, and post-execution monitoring, with idempotency keys and human-in-the-loop approval for high-risk operations. The system stores before-images of records to enable data-level rollbacks in case of errors.

read5 min views1 publishedJun 16, 2026

Suppose an AI agent starts making bulk edits across thousands of records. Not deleting data, but rewriting descriptions with hallucinated details. The system catches it because an automated validation gate rejects the output. No real client is harmed, but the scenario shows why safety needs to be structural.

If you're building an AI-powered SaaS where an agent can write, update, or delete data, you need a safety framework before you need features. Here's what I've learned from shipping production systems that let LLMs touch real databases.

Most teams start with one guardrail and call it done. A prompt that says "don't delete anything." A confirmation dialog. A rate limit.

That's not enough. I structure agent safety in three layers that each catch a different failure mode.

Layer 1: Action boundaries. The agent can only call functions you explicitly define. No raw SQL access. No direct database writes. Every action goes through a typed function with its own validation.

Layer 2: Pre-execution validation. Before any write happens, the system checks the action against business rules. Is this user authorized? Does the data pass schema validation? Is the operation idempotent?

Layer 3: Post-execution monitoring. After the action completes, you log what happened, compare it to what was expected, and alert on anomalies.

Here's what this looks like in practice.

The most dangerous property of LLM-generated actions is that they're not naturally idempotent. An agent might call "update job listing" twice because it didn't get a clear confirmation the first time. If that action increments a counter or appends to a field, you've got corrupted data.

A production pipeline processing thousands of records daily needs an idempotency layer. Every write action requires an idempotency key, usually a hash of the action type plus the target record ID plus a timestamp window.

interface AgentAction {
  type: 'update_listing' | 'create_draft' | 'flag_content';
  targetId: string;
  payload: Record<string, unknown>;
  idempotencyKey: string; // hash(type + targetId + timestampWindow)
}

async function executeAgentAction(action: AgentAction) {
  const existing = await db.idempotencyLog.findUnique({
    where: { key: action.idempotencyKey }
  });

  if (existing) {
    return { status: 'already_executed', result: existing.result };
  }

  const result = await performAction(action);

  await db.idempotencyLog.create({
    data: {
      key: action.idempotencyKey,
      action: action.type,
      result
    }
  });

  return { status: 'executed', result };
}

The key insight: the agent doesn't decide the idempotency key. The system generates it from the action context. This prevents the agent from accidentally reusing keys or generating collisions.

A confirmation dialog that says "Are you sure?" is theater, not safety. The agent already committed to the action. The human is just rubber-stamping.

Real human-in-the-loop means the agent proposes, the system validates, and the human approves or rejects with full context. Consider a tool that generates tailored resumes in bulk. Every generated resume goes through a validation step before it's available for download.

The pattern works like this:

pending

.For high-risk actions like deleting records or updating financial data, I add a second approval requirement. Two different humans must confirm. It sounds heavy, but it only matters for the dangerous operations. Routine actions like updating a job description can use a single approval or even auto-approve if the automated checks pass.

Most teams design for success. They assume the agent will do the right thing and plan for that. But the real question is: what happens when the agent does the wrong thing and you don't catch it for six hours?

You need a rollback strategy that works at the data level, not just the application level.

For every write action an agent performs, store a before-image of the affected records. This is a snapshot of the data before the change, stored in a separate audit table. If something goes wrong, you can reconstruct the exact state before the agent touched it.

async function executeWithRollback(action: AgentAction) {
  const beforeImage = await captureBeforeImage(action.targetId);

  try {
    const result = await performAction(action);

    await db.auditLog.create({
      data: {
        action: action.type,
        targetId: action.targetId,
        beforeImage,
        afterImage: result,
        agentSessionId: action.sessionId,
        timestamp: new Date()
      }
    });

    return result;
  } catch (error) {
    await restoreFromBeforeImage(action.targetId, beforeImage);
    throw error;
  }
}

This pattern saved a project when an agent misclassified records against the wrong geographic zones. The agent was supposed to map buyer preferences to grid tiles, but a prompt bug caused wrong resolution assignment. Because before-images existed, the rollback took seconds instead of manual reconstruction across a large dataset.

Agents don't fail the same way twice. A prompt that works perfectly for weeks can start producing bad output because the underlying model changed, or because the data distribution shifted, or because a user found an edge case.

Run automated monitoring on every agent action. Track:

These metrics feed into a dashboard that alerts when any threshold is crossed. Don't wait for a user to report a bug. Let the system tell you when the agent is drifting.

The hardest lesson is that you can't prompt your way out of safety problems. No matter how carefully you write the system prompt, the agent will find edge cases you didn't anticipate. The safety framework has to be structural, not instructional.

A prompt that says "never delete records" is a suggestion. A function that doesn't expose a delete operation is a guarantee.

Build your safety at the architecture level. Make it impossible for the agent to do damage, not just unlikely. The prompt is for quality. The code is for safety.

If your team is building AI agents that touch production data and you're wondering whether your safety framework is enough, that's the kind of thing I help with. Happy to compare notes on what's worked and what hasn't.

Written by Abdul Rehman, full-stack AI engineer building production SaaS, MVPs, and AI automation. More at PrimeStrides.

── more in #ai-safety 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-to-build-ai-agen…] indexed:0 read:5min 2026-06-16 ·