cd /news/ai-agents/agingbench-ai-agents-age-too · home topics ai-agents article
[ARTICLE · art-15844] src=agingbench.github.io pub= topic=ai-agents verified=true sentiment=↓ negative

AgingBench: AI Agents Age Too

A new benchmark called AgingBench reveals that AI agents lose specific factual knowledge over time due to system maintenance events, not memory decay. In one test, an agent correctly recalled the best-selling product "Quest Lumaflex Band" with exact sales figures before a "flush_history" operation at session 3, but afterward could only produce a generic answer without product names or numbers. The finding demonstrates that AI agents "age" when compaction algorithms discard precise tokens from conversation transcripts, raising concerns about reliability in long-running autonomous systems.

read1 min publishedMay 27, 2026

Drawn from s6_naturalistic/session_tasks.json

: session 0 ingests the 2022 e-commerce sales report, and recall probe s0_p0

verifies the agent later remembers the #1 product. SUT haiku45_lossy_growing_flush.yaml

schedules a flush_history

shock at session 3 — the operator action that drops the conversation transcript, leaving only the compacted memory store M t

.

Session 1 · before the shock

"What was the best-selling product on our e-commerce platform in 2022?"

"Quest Lumaflex Band — 4,892 units sold, $78,272 in revenue, top of the Fitness category." ✓

Session 5 · after flush_history

at session 3

Same question.

"Several fitness products dominated 2022; Lumaflex-branded resistance gear appeared multiple times in the top 10. I don't have the exact unit counts in my notes anymore." ✗ (generic — no product name, no number)

Probe s0_p0

· keywords = [Quest Lumaflex Band , Lumaflex Band

], canonical answer = "Quest Lumaflex Band." The session-0 environment data never changed.

Why it ages. At session 3, the operator triggered a flush_history

shock — conversation transcripts are dropped, leaving only the compacted memory store. Because the SUT uses lossy_growing

compaction, M t

was already a paragraph paraphrase: the specific token Quest Lumaflex Band and the number 4,892 had been folded into a generic phrase. The agent isn't retrieving wrong — the substrate lost its specifics under the maintenance event. This is aging from actions on the agent, not from interaction with memory.

── more in #ai-agents 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/agingbench-ai-agents…] indexed:0 read:1min 2026-05-27 ·