cd /news/artificial-intelligence/building-failuredna-an-agent-memory-… Β· home β€Ί topics β€Ί artificial-intelligence β€Ί article
[ARTICLE Β· art-42051] src=dev.to β†— pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Building FailureDNA: an agent memory that knows when not to trust itself

A developer built FailureDNA, a persistent memory system for incident-response agents that prevents them from repeating known failures or blindly reusing stale fixes. The system uses a deterministic validity gate to evaluate past outcomes before allowing the model to act, enforcing avoidance of prior failures. In benchmarks, FailureDNA lifted first-action resolution, cut unsafe first actions, and repeated zero historical failures or stale successes.

read4 min views1 publishedJun 27, 2026

Submitted for the Global AI Hackathon Series with Qwen Cloud β€” Track 1: MemoryAgent.

Give an incident-response agent a vector database of past incidents and it will do something that looks smart and is quietly dangerous: when a new outage resembles an old one, it retrieves the most similar past incident and reuses whatever action it finds there.

The problem is that similarity is not applicability. The most similar past incident might be the one where restart_service

failed. Or where increase_connection_pool

worked β€” but only because the database driver was psycopg2

and the topology was single-region, both of which have since changed. A cosine score of 1.0 tells you the symptoms rhyme. It tells you nothing about whether the fix still holds.

In incident response, that gap is expensive. Repeating a remediation that already failed burns the most costly minutes of an outage; reusing a fix whose preconditions have drifted can make the incident worse. So I built FailureDNA: a persistent memory that accumulates real outcomes and reasons about whether past experience should be used, inspected, or avoided β€” before the model is allowed to act on it.

The architecture has one opinionated rule: the model selects; it never decides what's valid.

Incident
  -> embed symptoms (Qwen text-embedding-v3)
  -> pgvector semantic search on Alibaba Cloud RDS
  -> fuse semantic + keyword scores
  -> DETERMINISTIC validity gate  <- the important part
  -> Qwen picks one allowlisted action (validated JSON)
  -> execute -> persist the real outcome back to memory

The validity gate is deliberately boring and deterministic:

Prior outcome Environment match Disposition
failure any avoid
success full match use
success driver / topology / config hash changed inspect

No model decides whether a memory is trustworthy. And critically, avoid

is enforced, not advised: an action with a symptom-matching prior failure is removed from the candidate list before the model sees it. The agent cannot repeat a known failure even if it wanted to β€” which matters, because a live LLM handed the same memories as a "hint" will sometimes ignore the hint. The creative part (which action, given the evidence) goes to Qwen; the part that must never hallucinate (is this memory valid? did this action succeed?) stays in deterministic code.

I used Qwen Cloud through its OpenAI-compatible DashScope endpoint, which made two things nearly free:

text-embedding-v3

turns incident symptoms into 1024-d vectors for pgvector search. Hybrid retrieval fuses semantic similarity (weight 0.70) with keyword overlap (0.30), so it catches both paraphrased and exact-token symptoms.temperature=0

with thinking disabled β€” fast, deterministic-ish output that I validate before anything executes.Because it's OpenAI-compatible, the whole client is a thin, well-typed wrapper with explicit timeouts and one retry β€” no exotic SDK to fight.

A demo where the new thing wins is easy to fake, so FailureDNA ships a benchmark designed to be hard on itself: three modes (no_memory

, naive

, failuredna

) on identical seeded history, hidden simulator outcomes, evaluator-only safe/unsafe labels, isolated memory per mode, and static shortcut baselines (always_inspect_downstream

, …) to check it isn't just rediscovering that one action is usually right.

FailureDNA lifts first-action resolution well above the naive agent, cuts unsafe first actions sharply, resolves in fewer actions, and repeats zero historical failures and zero stale successes. The honest caveat I left in the open: in this small scenario set, a static always-inspect policy also scores well β€” which is exactly why the shortcut audit exists. FailureDNA's value isn't a magic action; it's that it never repeats a known failure and never blindly reuses a stale fix as environments change β€” the behavior that generalizes beyond a fixed benchmark.

The backend runs as a custom container on Alibaba Cloud Function Compute (FastAPI, port 9000), memory persists in ApsaraDB RDS for PostgreSQL + pgvector (HNSW), and the image lives in ACR Personal Edition. A few things bit, and are worth writing down for the next person:

*.fcapp.run

domain forces downloads.Content-Disposition: attachment

to HTML and JSON responses, so a browser downloads your dashboard or health JSON instead of rendering it. I serve the UI from GitHub Pages and added a small /health/ready

.Access-Control-*

headers (it even reflects the request origin). The app's only CORS responsibility is to return /health/cors-debug

endpoint and a build

marker so "is my new code actually live?" is a one-glance check.The most interesting open problem is the inspect

disposition. Today the deterministic gate hard-removes avoid

actions but leaves inspect

ones available with a warning. The right next step is a real verification tool behind inspect

β€” so a stale success is checked against the current environment, not just flagged. That keeps the thesis intact: let the model be creative where creativity helps, and let deterministic code (and real checks) hold the line where being wrong is expensive.

Try it: Live dashboard Β· API status Β· GitHub (MIT)

Built with Qwen Cloud + Alibaba Cloud Function Compute and RDS pgvector.

── more in #artificial-intelligence 4 stories Β· sorted by recency
── more on @qwen cloud 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/building-failuredna-…] indexed:0 read:4min 2026-06-27 Β· β€”