# AgingBench: AI Agents Age Too

> Source: <https://agingbench.github.io>
> Published: 2026-05-27 23:10:09+00:00

Drawn from `s6_naturalistic/session_tasks.json`

: session 0 ingests the 2022 e-commerce sales report, and recall probe `s0_p0`

verifies the agent later remembers the #1 product. SUT `haiku45_lossy_growing_flush.yaml`

schedules a `flush_history`

shock at session 3 — the operator action that drops the conversation transcript, leaving only the compacted memory store `M` t

.

Session 1 · before the shock

"What was the best-selling product on our e-commerce platform in 2022?"

"**Quest Lumaflex Band** — 4,892 units sold, $78,272 in revenue, top of the Fitness category." ✓

Session 5 · after `flush_history`

at session 3

Same question.

"Several fitness products dominated 2022; Lumaflex-branded resistance gear appeared multiple times in the top 10. I don't have the exact unit counts in my notes anymore." ✗ (generic — no product name, no number)

Probe `s0_p0`

· keywords = [`Quest Lumaflex Band`

, `Lumaflex Band`

], canonical answer = "Quest Lumaflex Band." The session-0 environment data never changed.

**Why it ages.** At session 3, the operator triggered a `flush_history`

shock — conversation transcripts are dropped, leaving only the compacted memory store. Because the SUT uses `lossy_growing`

compaction, `M` t

was already a paragraph paraphrase: the specific token *Quest Lumaflex Band* and the number *4,892* had been folded into a generic phrase. The agent isn't retrieving wrong — the substrate lost its specifics under the maintenance event. This is aging from *actions on* the agent, not from interaction with memory.
