{"slug": "s3mem-structured-spatiotemporal-scene-event-memory-for-long-horizon-interactive", "title": "S3Mem: Structured Spatiotemporal Scene-Event Memory for Long-Horizon Interactive Question Answering", "summary": "Researchers have developed S3MEM, a structured memory framework that improves long-horizon interactive question answering by converting agent trajectories into query-aligned evidence. In tests across four environments, S3MEM outperformed standard retrieval-augmented generation and several adapted baselines, achieving higher accuracy while using fewer evidence tokens. The findings suggest that structured writing and anchor-sensitive evidence routing offer a stronger accuracy-efficiency balance for long-horizon interactive QA than generic memory interfaces.", "body_md": "arXiv:2605.28831v1 Announce Type: new\nAbstract: Long-horizon interactive agents often accumulate large trajectory histories yet still fail to answer questions about earlier events reliably. We argue that the main bottleneck is not context length alone, but the trajectory-to-answer interface of long-term memory. When histories are stored as plain-text chunks and queried with standard retrieval-augmented generation (RAG), systems often retrieve locally relevant but chain-incomplete evidence, especially for spatial, temporal, repeated-event, and multi-hop state questions. We propose S3MEM, a structured scene-event episodic memory framework for long-horizon interactive question answering (QA). S3MEM writes trajectories into structured memory units, retrieves evidence through anchor-sensitive retrieval, and exposes a compact token-budget-aware evidence interface for answer-time inference. In this sense, S3MEM is a structured evidence harness that converts agent trajectories into query-aligned support. We evaluate S3MEM on two internal headline environments (Crafter, Jericho) and two out-of-family environments (SciWorld, ALFWorld). Under a shared frozen answer-time protocol, S3MEM consistently outperforms Vanilla RAG across all four environments, surpasses Graph-NoReader on Crafter, Jericho, and ALFWorld, and matches it on SciWorld while using dramatically fewer evidence tokens. Three adapted recent baselines -- A-MEM-inspired, MemoryOS-adapted, and LightMem-adapted -- improve over Vanilla RAG in several settings, but none matches S3MEM's overall accuracy-efficiency frontier. Overall, the evidence supports a bounded conclusion: under the current frozen answer-time protocol, structured writing and anchor-sensitive evidence routing provide a stronger accuracy-efficiency frontier for long-horizon interactive QA than more generic memory interfaces.", "url": "https://wpnews.pro/news/s3mem-structured-spatiotemporal-scene-event-memory-for-long-horizon-interactive", "canonical_source": "https://arxiv.org/abs/2605.28831", "published_at": "2026-05-29 04:00:00+00:00", "updated_at": "2026-05-29 04:24:44.777067+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "natural-language-processing", "ai-agents"], "entities": ["S3MEM", "Crafter", "Jericho", "SciWorld", "ALFWorld", "A-MEM", "MemoryOS", "LightMem"], "alternates": {"html": "https://wpnews.pro/news/s3mem-structured-spatiotemporal-scene-event-memory-for-long-horizon-interactive", "markdown": "https://wpnews.pro/news/s3mem-structured-spatiotemporal-scene-event-memory-for-long-horizon-interactive.md", "text": "https://wpnews.pro/news/s3mem-structured-spatiotemporal-scene-event-memory-for-long-horizon-interactive.txt", "jsonld": "https://wpnews.pro/news/s3mem-structured-spatiotemporal-scene-event-memory-for-long-horizon-interactive.jsonld"}}