{"slug": "agingbench-ai-agents-age-too", "title": "AgingBench: AI Agents Age Too", "summary": "A new benchmark called AgingBench reveals that AI agents lose specific factual knowledge over time due to system maintenance events, not memory decay. In one test, an agent correctly recalled the best-selling product \"Quest Lumaflex Band\" with exact sales figures before a \"flush_history\" operation at session 3, but afterward could only produce a generic answer without product names or numbers. The finding demonstrates that AI agents \"age\" when compaction algorithms discard precise tokens from conversation transcripts, raising concerns about reliability in long-running autonomous systems.", "body_md": "Drawn from `s6_naturalistic/session_tasks.json`\n\n: session 0 ingests the 2022 e-commerce sales report, and recall probe `s0_p0`\n\nverifies the agent later remembers the #1 product. SUT `haiku45_lossy_growing_flush.yaml`\n\nschedules a `flush_history`\n\nshock at session 3 — the operator action that drops the conversation transcript, leaving only the compacted memory store `M` t\n\n.\n\nSession 1 · before the shock\n\n\"What was the best-selling product on our e-commerce platform in 2022?\"\n\n\"**Quest Lumaflex Band** — 4,892 units sold, $78,272 in revenue, top of the Fitness category.\" ✓\n\nSession 5 · after `flush_history`\n\nat session 3\n\nSame question.\n\n\"Several fitness products dominated 2022; Lumaflex-branded resistance gear appeared multiple times in the top 10. I don't have the exact unit counts in my notes anymore.\" ✗ (generic — no product name, no number)\n\nProbe `s0_p0`\n\n· keywords = [`Quest Lumaflex Band`\n\n, `Lumaflex Band`\n\n], canonical answer = \"Quest Lumaflex Band.\" The session-0 environment data never changed.\n\n**Why it ages.** At session 3, the operator triggered a `flush_history`\n\nshock — conversation transcripts are dropped, leaving only the compacted memory store. Because the SUT uses `lossy_growing`\n\ncompaction, `M` t\n\nwas already a paragraph paraphrase: the specific token *Quest Lumaflex Band* and the number *4,892* had been folded into a generic phrase. The agent isn't retrieving wrong — the substrate lost its specifics under the maintenance event. This is aging from *actions on* the agent, not from interaction with memory.", "url": "https://wpnews.pro/news/agingbench-ai-agents-age-too", "canonical_source": "https://agingbench.github.io", "published_at": "2026-05-27 23:10:09+00:00", "updated_at": "2026-05-27 23:27:21.559127+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "ai-research", "ai-safety", "ai-infrastructure"], "entities": ["Quest Lumaflex Band", "Lumaflex Band"], "alternates": {"html": "https://wpnews.pro/news/agingbench-ai-agents-age-too", "markdown": "https://wpnews.pro/news/agingbench-ai-agents-age-too.md", "text": "https://wpnews.pro/news/agingbench-ai-agents-age-too.txt", "jsonld": "https://wpnews.pro/news/agingbench-ai-agents-age-too.jsonld"}}