arXiv:2606.02875v1 Announce Type: new Abstract: Coding-agent benchmarks evaluate whether a single uninterrupted agent can resolve a repository issue. Real software work is messier: tasks are interrupted, reassigned, reviewed, and resumed from partial states left by another agent or engineer. We study this missing dimension through \emph{handoff debt}: the rediscovery cost imposed when a predecessor's work is opaque or incomplete. Our takeover protocol interrupts a coding agent at deterministic handoff points, freezes the repository, and evaluates successor agents under four handoff views: repository state only, raw trace, summary notes, and structured notes. Across 75 source tasks, the protocol generates 181 handoff-point tasks and 724 takeover runs per successor model. Across three successor models, context-bearing handoffs reduce median agent events by 20--59% and cumulative prompt tokens by 42--63% relative to repository-only takeover. Solved-rate effects are smaller and model-dependent, but efficiency gains are consistent. These findings suggest that coding-agent evaluation should report not only whether a task is solved, but also how costly that work is for another agent to resume.
Handoff Debt: The Rediscovery Cost When Coding Agents Take Over Interrupted Tasks
A new study introduces "handoff debt," the rediscovery cost incurred when a coding agent resumes an interrupted task left by another agent or engineer. Testing across 75 source tasks and 724 takeover runs per model, researchers found that providing context-bearing handoffs, such as summary notes, reduced median agent events by 20-59% and cumulative prompt tokens by 42-63% compared to repository-only takeovers. The findings argue that coding-agent benchmarks should measure not just task completion, but the efficiency cost of resuming work from a predecessor's partial state.
Run your AI side-project on zahid.host
EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.