04:00
2026-06-06
arxiv.org
ai-agents
SentinelBench: A Benchmark for Long-Running Monitoring Agents
Researchers have introduced SentinelBench, an open-source benchmark designed to evaluate AI agents on long-running monitoring tasks that require sustained attention rather than continuous action. The โฆ