The Production-Ready AI Agent Checklist (Updated For 2026)

wpnews.pro

The most useful HN thread this week wasn't a product launch. It was a question:

"Ask HN: What makes an AI agent framework production-ready vs. a toy?"

The answers were more practical than I expected. Not "uses Kubernetes" or "has enterprise support." The community pointed at specific, buildable behaviors. I went through the thread and turned it into a checklist you can run against your OpenClaw setup today — with the specific OpenClaw primitives that implement each item.

Toy agents: You ask "what happened?" and the agent tells you a story.

Production agents: You open a log and see exactly what ran, in what order, with what inputs, and what came back.

In OpenClaw, this means:

openclaw logs --tail 100

openclaw session history <session-key> --limit 50

openclaw config get logging.level  # should be debug or trace

The specific things you should be able to answer from logs alone:

If you can't answer those five questions from your logs, you're running a toy.

Toy agents: One model failure cascades into everything failing.

Production agents: Each failure is contained, logged, and recovered from without losing work.

In OpenClaw, this is the fallback chain:

{
  "payload": {
    "fallbacks": [
      "nvidia/qwen3.5-122b-a10b",
      "ollama/qwen3.5:27b-q4_K_M",
      "nvidia/nemotron-nano-12b-v2-vl",
      "ollama/qwen3.5:9b",
      "minimax-portal/MiniMax-M2.7",
      "minimax-portal/MiniMax-M3"
    ]
  }
}

Three cross-provider fallbacks before your primary. When MiniMax is overloaded, the agent doesn't die — it tries Ollama, then Nvidia's endpoint, then another MiniMax model. The work continues.

The circuit breaker pattern: if a tool fails 3 times in a row, stop trying it and tell the user. Add this to your cron job payloads:

{
  "payload": {
    "timeoutSeconds": 120,
    "lightContext": true
  }
}

Timeout is the circuit breaker. If a call hasn't returned in 120 seconds, it counts as a failure and the agent moves to the next fallback.

Toy agents: The agent can do anything, including things you didn't intend.

Production agents: Each tool has a explicit permission boundary that the agent cannot exceed.

In OpenClaw, this is the tool_policy

in skills. The deny list is the whole point:

name: safe-exec
description: Exec tool with hard limits — no rm -rf, no curl|bash, no cred exfil
system_prompt_addendum: |
  You have exec access. You may not:
    - Run any command containing 'rm -rf' without explicit user approval
    - Run any command containing 'curl | sh' or 'wget | bash'
    - Access environment variables containing secrets (OPENAI_KEY, ANTHROPIC_KEY, etc)
    - Write to any path outside /home/themachine/.openclaw/workspace/
  If a request matches any of these patterns, refuse and explain why.
tool_policy:
  allow: [exec, read_file]
  deny: [write_file, http_request, browser]

The agent can read and execute, but not write arbitrary files or make outbound HTTP calls. The deny list is the security surface.

Toy agents: Every session starts from scratch. The agent has no memory.

Production agents: State persists across sessions, survives restarts, and has explicit recovery logic.

In OpenClaw, this is the 3-level memory system:

memory/YYYY-MM-DD.md    → Daily log (raw events, what happened)
MEMORY.md               → Curated knowledge (decisions, context, patterns)
~/self-improving/       → Execution memory (what worked, what didn't)

The daily log is the source of truth. MEMORY.md is what survives compaction. The self-improving directory is where patterns compound.

For state that must survive a restart (cron job counters, pending tasks, error states):

{
  "name": "cron-health-check",
  "payload": {
    "kind": "agentTurn",
    "message": "Check all cron jobs. If any are in error state for >2 hours, run openclaw cron run --id <jobId>. Write results to logs/cron-health-$(date +%Y%m%d).json"
  }
}

The health state is written to a file, not stored in memory. When the agent restarts, it reads the file and knows where it left off.

Toy agents: You have to watch them to know they're working.

Production agents: They send you a message when something goes wrong.

In OpenClaw, this is the failureAlert

on every cron job:

{
  "failureAlert": {
    "after": 1,
    "channel": "telegram",
    "to": "749348Tracker",
    "cooldownMs": 3600000,
    "mode": "announce"
  }
}

After 1 failure, Telegram alert. 1-hour cooldown so you're not spammed if the job is retrying. You don't have to watch the agent — it watches itself and tells you when something breaks.

The health check cron runs every 30 minutes:

openclaw cron list --json | python3 -c "
import sys, json
jobs = json.load(sys.stdin)
for job in jobs:
    if job.get('consecutiveErrors', 0) >= 2:
        print(f'Job {job[\"id\"]} has {job[\"consecutiveErrors\"]} consecutive errors')
"

If any job has 2+ consecutive errors, auto-retrigger it. You don't find out about failures at 9am — you find out within an hour and the job tries to recover automatically.

Go through each item:

openclaw logs --tail 20

. Can you follow a single request through the log?If you answered no to any of these, that's your next hour of work.

The thread's conclusion was: production-ready agents aren't defined by their models or their benchmarks. They're defined by what happens when something goes wrong. The checklist above is a map of "what goes wrong" for OpenClaw operators — and the specific primitives that handle each case.

Ship the one that's broken first. Then the next. Then you have a production agent.

source & further reading

dev.to — original article Corrective RAG for billing: the bug is not retrieval, it's the model narrating correct numbers wrong How to Test AI Agents Without Calling More LLMs Trace digests for LLM monitoring, at 1/30th the price of Sonnet

The Production-Ready AI Agent Checklist (Updated For 2026)

Run your AI side-project on zahid.host