{"slug": "ai-coding-agents-need-runtime-telemetry-before-commit-telemetry", "title": "AI Coding Agents Need Runtime Telemetry Before Commit Telemetry", "summary": "A new arXiv paper scanning over 180 million Git repositories found that AI coding agents are heavily used in open source, but single-signal observability is weak. The study revealed a 30x recall gap between multi-method detection and bot-account lookup for Claude Code commits. The paper argues that runtime telemetry, not just commit telemetry, is essential for monitoring agent execution and preventing unsafe behavior.", "body_md": "A new arXiv paper published on June 23, 2026 scanned more than 180 million Git repositories to detect traces of AI coding agents in open source. The authors used multiple signals, including configuration-file scanning, commit-message analysis, author-identity matching, and bot-signature lookup.\n\nThe most useful result for developers is the visibility gap.\n\nIn one snapshot, multi-method detection found 850,157 Claude Code commits.\n\nBot-account lookup found only 28,154.\n\nThat is 3.3%, or a 30x relative recall gap.\n\nThe paper also reports more than 320,000 commit-attributed agent commits per month across snapshots from December 2024 to April 2026.\n\nThe immediate takeaway:\n\nAI coding agents are being used heavily.\n\nThe engineering takeaway:\n\nSingle-signal observability is weak.\n\nCommit telemetry is too late\n\nA commit is the end of an agent run.\n\nIt does not tell you enough about the run itself.\n\nA commit may not show:\n\nhow many model calls happened\n\nhow many retries happened\n\nwhether prompts repeated\n\nwhether tools failed\n\nwhether the model price was known\n\nwhether the run exceeded budget\n\nwhether the agent made progress\n\nwhether fallback models were used\n\nwhether the agent stopped safely\n\nIf you only inspect the repository after the fact, you are observing the artifact.\n\nYou are not observing the execution.\n\nFor agent systems, execution is where many failures happen.\n\nAgents are loops\n\nA coding agent is usually some version of this:\n\nwhile (!task.done) {\n\nconst response = await model.call(task.context);\n\nconst action = parseAction(response);\n\nconst result = await runTool(action);\n\ntask = updateTask(task, result);\n\n}\n\nThis is useful.\n\nIt is also incomplete.\n\nThere is no budget.\n\nNo max-step limit.\n\nNo retry control.\n\nNo prompt-loop detection.\n\nNo known-pricing check.\n\nNo no-progress stop.\n\nA safer runtime shape puts a decision before the provider call.\n\nconst decision = guard.beforeCall({\n\nrunId: task.id,\n\nmodel: task.model,\n\nprompt: task.currentPrompt,\n\nstepCount: task.steps.length,\n\nretryCount: task.retryCount,\n\npreviousPrompts: task.previousPrompts,\n\nbudgetRemaining: task.budgetRemaining,\n\nprogressState: task.progress,\n\n});\n\nif (!decision.allowed) {\n\nreturn {\n\nstatus: \"stopped\",\n\nreason: decision.reason,\n\nerror: decision.error,\n\n};\n\n}\n\nconst response = await model.call(task.context);\n\nThe important part is not the exact API.\n\nThe important part is timing.\n\nThe check happens before the provider call.\n\nThat means the runtime can stop unsafe execution before more cost is created.\n\nWhat to log before the call\n\nA useful agent runtime should log decision inputs, not only final outputs.\n\nFor each provider call, consider recording:\n\ntype AgentCallDecision = {\n\nrunId: string;\n\nmodel: string;\n\nmodelPriceKnown: boolean;\n\nstepCount: number;\n\nmaxSteps: number;\n\nretryCount: number;\n\nbudgetRemaining: number;\n\nestimatedNextCallCost: number;\n\npromptSimilarityScore?: number;\n\nprogressScore?: number;\n\nallowed: boolean;\n\nstopReason?: string;\n\n};\n\nThis gives you data that a commit cannot provide.\n\nYou can now ask:\n\nWhich tasks hit max steps?\n\nWhich runs stopped because pricing was unknown?\n\nWhich prompts repeated?\n\nWhich models caused budget pressure?\n\nWhich agent workflows produced commits only after many failed attempts?\n\nWhich agents consumed budget without progress?\n\nThat is runtime telemetry.\n\nGuardrails to implement first\n\nAgents should not run forever.\n\nif (stepCount >= maxSteps) {\n\nreturn {\n\nallowed: false,\n\nreason: \"max_steps_exceeded\",\n\n};\n\n}\n\nThis is basic.\n\nIt is also one of the highest-value controls.\n\nIf the runtime cannot price the model, it cannot enforce a budget.\n\nif (!pricingCatalog[model]) {\n\nreturn {\n\nallowed: false,\n\nreason: \"unknown_model_pricing\",\n\n};\n\n}\n\nDo not guess.\n\nFail closed.\n\nBudgets should exist at the task level, not only at the account level.\n\nif (estimatedNextCallCost > budgetRemaining) {\n\nreturn {\n\nallowed: false,\n\nreason: \"budget_exceeded\",\n\n};\n\n}\n\nA small refactor and a multi-hour migration should not share the same ceiling.\n\nRetries are normal.\n\nRetry storms are not.\n\nif (retryCount > maxRetries && recentErrorsAreSimilar(errors)) {\n\nreturn {\n\nallowed: false,\n\nreason: \"retry_storm_detected\",\n\n};\n\n}\n\nThe goal is not to ban retries.\n\nThe goal is to stop blind repetition.\n\nIf the current prompt is almost the same as previous failed prompts, the agent may be stuck.\n\nif (similarToRecentPrompt(currentPrompt, previousPrompts)) {\n\nreturn {\n\nallowed: false,\n\nreason: \"similar_prompt_loop\",\n\n};\n\n}\n\nEven a simple similarity check can catch obvious waste.\n\nA run can be active and still not moving.\n\nTrack progress signals:\n\ntests passing\n\nerrors decreasing\n\nfiles changing meaningfully\n\nchecklist items completing\n\nuser-defined success criteria improving\n\nIf those signals do not change after several steps, stop.\n\nWhy this matters now\n\nGitHub has already said Copilot moved to usage-based billing on June 1, 2026, with usage calculated from token consumption including input, output, and cached tokens. GitHub also described Copilot as moving from an in-editor assistant into an agentic platform capable of long, multi-step coding sessions across repositories.\n\nThat means agent runtime behavior increasingly has direct cost impact.\n\nA loop is no longer just a UX problem.\n\nIt is a billing problem.\n\nA retry storm is not just noisy.\n\nIt is spend.\n\nA prompt loop is not just inefficient.\n\nIt is measurable waste.\n\nWhere AI CostGuard fits\n\nAI CostGuard is the local-first TypeScript / Node.js runtime safety layer I’m building for this problem.\n\nIt focuses on stopping agent failures before provider calls execute:\n\nretry storms\n\nprompt loops\n\nmax-step explosions\n\nno-progress runs\n\nbudget overruns\n\nunknown model pricing\n\nrunaway agent behavior\n\nThe key design question is simple:\n\nShould this next provider call be allowed?\n\nIf the answer is no, the runtime should return a structured stop reason before the call happens.\n\nTakeaway\n\nThe new arXiv paper shows that even detecting AI coding-agent activity in repositories requires multiple signals.\n\nThat lesson applies directly to runtime engineering.\n\nDo not wait for the commit.\n\nDo not wait for the dashboard.\n\nDo not wait for the invoice.\n\nInstrument the loop.\n\nAdd one pre-call decision log to your agent runtime before adding another dashboard.\n\n[https://github.com/salimassili62-afk/ai-costguard](https://github.com/salimassili62-afk/ai-costguard)", "url": "https://wpnews.pro/news/ai-coding-agents-need-runtime-telemetry-before-commit-telemetry", "canonical_source": "https://dev.to/assili_salim_e3c07f9954de/ai-coding-agents-need-runtime-telemetry-before-commit-telemetry-38i2", "published_at": "2026-06-26 13:52:33+00:00", "updated_at": "2026-06-26 14:33:56.536483+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-safety", "developer-tools"], "entities": ["arXiv", "Claude Code", "Git"], "alternates": {"html": "https://wpnews.pro/news/ai-coding-agents-need-runtime-telemetry-before-commit-telemetry", "markdown": "https://wpnews.pro/news/ai-coding-agents-need-runtime-telemetry-before-commit-telemetry.md", "text": "https://wpnews.pro/news/ai-coding-agents-need-runtime-telemetry-before-commit-telemetry.txt", "jsonld": "https://wpnews.pro/news/ai-coding-agents-need-runtime-telemetry-before-commit-telemetry.jsonld"}}