Building FailureDNA: an agent memory that knows when not to trust itself

wpnews.pro

cd /news/artificial-intelligence/building-failuredna-an-agent-memory-… · home › topics › artificial-intelligence › article

[ARTICLE · art-42051] src=dev.to ↗ pub=2026-06-27T20:07Z topic=artificial-intelligence verified=true sentiment=↑ positive

Building FailureDNA: an agent memory that knows when not to trust itself

A developer built FailureDNA, a persistent memory system for incident-response agents that prevents them from repeating known failures or blindly reusing stale fixes. The system uses a deterministic validity gate to evaluate past outcomes before allowing the model to act, enforcing avoidance of prior failures. In benchmarks, FailureDNA lifted first-action resolution, cut unsafe first actions, and repeated zero historical failures or stale successes.

read4 min views1 publishedJun 27, 2026

Submitted for the Global AI Hackathon Series with Qwen Cloud — Track 1: MemoryAgent.

Give an incident-response agent a vector database of past incidents and it will do something that looks smart and is quietly dangerous: when a new outage resembles an old one, it retrieves the most similar past incident and reuses whatever action it finds there.

The problem is that similarity is not applicability. The most similar past incident might be the one where restart_service

failed. Or where increase_connection_pool

worked — but only because the database driver was psycopg2

and the topology was single-region, both of which have since changed. A cosine score of 1.0 tells you the symptoms rhyme. It tells you nothing about whether the fix still holds.

In incident response, that gap is expensive. Repeating a remediation that already failed burns the most costly minutes of an outage; reusing a fix whose preconditions have drifted can make the incident worse. So I built FailureDNA: a persistent memory that accumulates real outcomes and reasons about whether past experience should be used, inspected, or avoided — before the model is allowed to act on it.

The architecture has one opinionated rule: the model selects; it never decides what's valid.

Incident
  -> embed symptoms (Qwen text-embedding-v3)
  -> pgvector semantic search on Alibaba Cloud RDS
  -> fuse semantic + keyword scores
  -> DETERMINISTIC validity gate  <- the important part
  -> Qwen picks one allowlisted action (validated JSON)
  -> execute -> persist the real outcome back to memory

The validity gate is deliberately boring and deterministic:

Prior outcome	Environment match	Disposition
failure	any	avoid
success	full match	use
success	driver / topology / config hash changed	inspect

No model decides whether a memory is trustworthy. And critically, avoid

is enforced, not advised: an action with a symptom-matching prior failure is removed from the candidate list before the model sees it. The agent cannot repeat a known failure even if it wanted to — which matters, because a live LLM handed the same memories as a "hint" will sometimes ignore the hint. The creative part (which action, given the evidence) goes to Qwen; the part that must never hallucinate (is this memory valid? did this action succeed?) stays in deterministic code.

I used Qwen Cloud through its OpenAI-compatible DashScope endpoint, which made two things nearly free:

text-embedding-v3

turns incident symptoms into 1024-d vectors for pgvector search. Hybrid retrieval fuses semantic similarity (weight 0.70) with keyword overlap (0.30), so it catches both paraphrased and exact-token symptoms.temperature=0

with thinking disabled — fast, deterministic-ish output that I validate before anything executes.Because it's OpenAI-compatible, the whole client is a thin, well-typed wrapper with explicit timeouts and one retry — no exotic SDK to fight.

A demo where the new thing wins is easy to fake, so FailureDNA ships a benchmark designed to be hard on itself: three modes (no_memory

, naive

, failuredna

) on identical seeded history, hidden simulator outcomes, evaluator-only safe/unsafe labels, isolated memory per mode, and static shortcut baselines (always_inspect_downstream

, …) to check it isn't just rediscovering that one action is usually right.

FailureDNA lifts first-action resolution well above the naive agent, cuts unsafe first actions sharply, resolves in fewer actions, and repeats zero historical failures and zero stale successes. The honest caveat I left in the open: in this small scenario set, a static always-inspect policy also scores well — which is exactly why the shortcut audit exists. FailureDNA's value isn't a magic action; it's that it never repeats a known failure and never blindly reuses a stale fix as environments change — the behavior that generalizes beyond a fixed benchmark.

The backend runs as a custom container on Alibaba Cloud Function Compute (FastAPI, port 9000), memory persists in ApsaraDB RDS for PostgreSQL + pgvector (HNSW), and the image lives in ACR Personal Edition. A few things bit, and are worth writing down for the next person:

*.fcapp.run

domain forces downloads.Content-Disposition: attachment

to HTML and JSON responses, so a browser downloads your dashboard or health JSON instead of rendering it. I serve the UI from GitHub Pages and added a small /health/ready

.Access-Control-*

headers (it even reflects the request origin). The app's only CORS responsibility is to return /health/cors-debug

endpoint and a build

marker so "is my new code actually live?" is a one-glance check.The most interesting open problem is the inspect

disposition. Today the deterministic gate hard-removes avoid

actions but leaves inspect

ones available with a warning. The right next step is a real verification tool behind inspect

— so a stale success is checked against the current environment, not just flagged. That keeps the thesis intact: let the model be creative where creativity helps, and let deterministic code (and real checks) hold the line where being wrong is expensive.

Try it: Live dashboard · API status · GitHub (MIT)

Built with Qwen Cloud + Alibaba Cloud Function Compute and RDS pgvector.

source & further reading

dev.to — original article PAL: Giving AI Agents Hands in the Physical World React vs. Angular in 2026: Choosing the Right Architecture for Enterprise Applications ContextVault: Own Your AI Context Across Models, Agents, and Time

~/api · this article 200

$curl api.wpnews.pro/v1/news/building-failuredna-an-a…

Read original on dev.to → dev.to/prabhakaranjm/building-failuredna-an-agen…

mentioned entities

Qwen Cloud

Alibaba Cloud

DashScope

pgvector

RDS

text-embedding-v3

FailureDNA

Global AI Hackathon Series

metadata

slugbuilding-failuredna-an-agent-memory-that-knows-when-not-to-trust-itself

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevReact vs. Angular in 2026: Choos…

next →Valkey 9.1: Hybrid Search Kills …

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 27 Jun · #artificial-intelligence

Humanizing Artificial Intelligence for Log Analysis: Turning Raw Server Logs Into Clear DevOps Answers

dev.to · 27 Jun · #artificial-intelligence

PAL: Giving AI Agents Hands in the Physical World

dev.to · 27 Jun · #artificial-intelligence

I Got Tired of AI Agents Having Root Access to Everything, So I Built XRisk

dev.to · 27 Jun · #artificial-intelligence

What changes when an AI agent can publish to the public web

── more on @qwen cloud 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required