AgingBench: AI Agents Age Too

wpnews.pro

cd /news/ai-agents/agingbench-ai-agents-age-too · home › topics › ai-agents › article

[ARTICLE · art-15844] src=agingbench.github.io ↗ pub=2026-05-27T23:10Z topic=ai-agents verified=true sentiment=↓ negative

AgingBench: AI Agents Age Too

A new benchmark called AgingBench reveals that AI agents lose specific factual knowledge over time due to system maintenance events, not memory decay. In one test, an agent correctly recalled the best-selling product "Quest Lumaflex Band" with exact sales figures before a "flush_history" operation at session 3, but afterward could only produce a generic answer without product names or numbers. The finding demonstrates that AI agents "age" when compaction algorithms discard precise tokens from conversation transcripts, raising concerns about reliability in long-running autonomous systems.

read1 min views10 publishedMay 27, 2026

Drawn from s6_naturalistic/session_tasks.json

: session 0 ingests the 2022 e-commerce sales report, and recall probe s0_p0

verifies the agent later remembers the #1 product. SUT haiku45_lossy_growing_flush.yaml

schedules a flush_history

shock at session 3 — the operator action that drops the conversation transcript, leaving only the compacted memory store M t

Session 1 · before the shock

"What was the best-selling product on our e-commerce platform in 2022?"

"Quest Lumaflex Band — 4,892 units sold, $78,272 in revenue, top of the Fitness category." ✓

Session 5 · after flush_history

at session 3

Same question.

"Several fitness products dominated 2022; Lumaflex-branded resistance gear appeared multiple times in the top 10. I don't have the exact unit counts in my notes anymore." ✗ (generic — no product name, no number)

Probe s0_p0

· keywords = [Quest Lumaflex Band , Lumaflex Band

], canonical answer = "Quest Lumaflex Band." The session-0 environment data never changed.

Why it ages. At session 3, the operator triggered a flush_history

shock — conversation transcripts are dropped, leaving only the compacted memory store. Because the SUT uses lossy_growing

compaction, M t

was already a paragraph paraphrase: the specific token Quest Lumaflex Band and the number 4,892 had been folded into a generic phrase. The agent isn't retrieving wrong — the substrate lost its specifics under the maintenance event. This is aging from actions on the agent, not from interaction with memory.

source & further reading

agingbench.github.io — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/agingbench-ai-agents-age…

Read original on agingbench.github.io → agingbench.github.io

mentioned entities

Quest Lumaflex Band

Lumaflex Band

metadata

slugagingbench-ai-agents-age-too

topic#ai-agents

secondary4 topics

sentimentnegative

canonicalagingbench.github.io

navigation

← prevNVIDIA Dynamo Snapshot: Fast Sta…

next →Anthropic posts $4.8B revenue, e…

── more in #ai-agents 4 stories · sorted by recency

dev.to · 12 Jul · #ai-agents

MCP Observatory: Scan, Test, and Secure MCP Servers Before Agents Depend on Them

dev.to · 12 Jul · #ai-agents

An LLM agent just ran a full ransomware attack. No human operator.

byteiota.com · 12 Jul · #ai-agents

Cloud Run Sandboxes Hit Preview: AI Code, No Extra Cost

dev.to · 12 Jul · #ai-agents

Documents Aren't Bags of Chunks

── more on @quest lumaflex band 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required