The most useful HN thread this week wasn't a product launch. It was a question:
"Ask HN: What makes an AI agent framework production-ready vs. a toy?"
The answers were more practical than I expected. Not "uses Kubernetes" or "has enterprise support." The community pointed at specific, buildable behaviors. I went through the thread and turned it into a checklist you can run against your OpenClaw setup today — with the specific OpenClaw primitives that implement each item.
Toy agents: You ask "what happened?" and the agent tells you a story.
Production agents: You open a log and see exactly what ran, in what order, with what inputs, and what came back.
In OpenClaw, this means:
openclaw logs --tail 100
openclaw session history <session-key> --limit 50
openclaw config get logging.level # should be debug or trace
The specific things you should be able to answer from logs alone:
If you can't answer those five questions from your logs, you're running a toy.
Toy agents: One model failure cascades into everything failing.
Production agents: Each failure is contained, logged, and recovered from without losing work.
In OpenClaw, this is the fallback chain:
{
"payload": {
"fallbacks": [
"nvidia/qwen3.5-122b-a10b",
"ollama/qwen3.5:27b-q4_K_M",
"nvidia/nemotron-nano-12b-v2-vl",
"ollama/qwen3.5:9b",
"minimax-portal/MiniMax-M2.7",
"minimax-portal/MiniMax-M3"
]
}
}
Three cross-provider fallbacks before your primary. When MiniMax is overloaded, the agent doesn't die — it tries Ollama, then Nvidia's endpoint, then another MiniMax model. The work continues.
The circuit breaker pattern: if a tool fails 3 times in a row, stop trying it and tell the user. Add this to your cron job payloads:
{
"payload": {
"timeoutSeconds": 120,
"lightContext": true
}
}
Timeout is the circuit breaker. If a call hasn't returned in 120 seconds, it counts as a failure and the agent moves to the next fallback.
Toy agents: The agent can do anything, including things you didn't intend.
Production agents: Each tool has a explicit permission boundary that the agent cannot exceed.
In OpenClaw, this is the tool_policy
in skills. The deny list is the whole point:
name: safe-exec
description: Exec tool with hard limits — no rm -rf, no curl|bash, no cred exfil
system_prompt_addendum: |
You have exec access. You may not:
- Run any command containing 'rm -rf' without explicit user approval
- Run any command containing 'curl | sh' or 'wget | bash'
- Access environment variables containing secrets (OPENAI_KEY, ANTHROPIC_KEY, etc)
- Write to any path outside /home/themachine/.openclaw/workspace/
If a request matches any of these patterns, refuse and explain why.
tool_policy:
allow: [exec, read_file]
deny: [write_file, http_request, browser]
The agent can read and execute, but not write arbitrary files or make outbound HTTP calls. The deny list is the security surface.
Toy agents: Every session starts from scratch. The agent has no memory.
Production agents: State persists across sessions, survives restarts, and has explicit recovery logic.
In OpenClaw, this is the 3-level memory system:
memory/YYYY-MM-DD.md → Daily log (raw events, what happened)
MEMORY.md → Curated knowledge (decisions, context, patterns)
~/self-improving/ → Execution memory (what worked, what didn't)
The daily log is the source of truth. MEMORY.md is what survives compaction. The self-improving directory is where patterns compound.
For state that must survive a restart (cron job counters, pending tasks, error states):
{
"name": "cron-health-check",
"payload": {
"kind": "agentTurn",
"message": "Check all cron jobs. If any are in error state for >2 hours, run openclaw cron run --id <jobId>. Write results to logs/cron-health-$(date +%Y%m%d).json"
}
}
The health state is written to a file, not stored in memory. When the agent restarts, it reads the file and knows where it left off.
Toy agents: You have to watch them to know they're working.
Production agents: They send you a message when something goes wrong.
In OpenClaw, this is the failureAlert
on every cron job:
{
"failureAlert": {
"after": 1,
"channel": "telegram",
"to": "749348Tracker",
"cooldownMs": 3600000,
"mode": "announce"
}
}
After 1 failure, Telegram alert. 1-hour cooldown so you're not spammed if the job is retrying. You don't have to watch the agent — it watches itself and tells you when something breaks.
The health check cron runs every 30 minutes:
openclaw cron list --json | python3 -c "
import sys, json
jobs = json.load(sys.stdin)
for job in jobs:
if job.get('consecutiveErrors', 0) >= 2:
print(f'Job {job[\"id\"]} has {job[\"consecutiveErrors\"]} consecutive errors')
"
If any job has 2+ consecutive errors, auto-retrigger it. You don't find out about failures at 9am — you find out within an hour and the job tries to recover automatically.
Go through each item:
openclaw logs --tail 20
. Can you follow a single request through the log?If you answered no to any of these, that's your next hour of work.
The thread's conclusion was: production-ready agents aren't defined by their models or their benchmarks. They're defined by what happens when something goes wrong. The checklist above is a map of "what goes wrong" for OpenClaw operators — and the specific primitives that handle each case.
Ship the one that's broken first. Then the next. Then you have a production agent.