{"slug": "i-audited-162-agent-written-prs-27-were-the-ai-fixing-itself", "title": "I audited 162 agent-written PRs – 27% were the AI fixing itself", "summary": "Commensa released an open-source tool, commensa-audit, that analyzes GitHub repositories to measure the percentage of engineering effort spent correcting AI-generated code. The company's own audit of 162 PRs found that 27% were the AI fixing itself, highlighting the need to distinguish progress from cleanup in agent-era development.", "body_md": "**What % of your AI engineering effort went to fixing your AI's own work?**\n\n`commensa-audit`\n\nanswers that from your git history. Point it at a GitHub repo; get a one-page report:\n\n**Rework tax**— share of PRs (and changed lines) that corrected earlier work, vs. net-new value** Superseded work**— PRs whose output was entirely replaced later (shown separately — discarded ≠ correcting)** Abandoned attempts**— PRs closed without merging: the waste merge-based metrics never see** Churn clusters**— chains of PRs rewriting each other (\"it took 10 PRs to get dark mode right\")** Line survival**— how much merged code is still alive at the end of the window** Hotspots**— rework share by module, against the repo-wide rate** Agent-marked share**— \"at least X% of PRs carry agent markers\" (Co-Authored-By trailers, body signatures) — a stated lower bound, never an attribution claim\n\nWe built it because we needed it: our own agent-built product shipped 162 PRs in 13 days, and the audit showed **27% of them were the AI correcting itself**.\n\n```\npip install commensa-audit\ncommensa-audit --repo owner/name --token $GH_TOKEN\n```\n\nOr straight from source:\n\n```\npip install git+https://github.com/commensa-ai/commensa-audit\n```\n\nOutput: `report_<repo>.html`\n\n(self-contained, forwardable), `audit_<repo>.json`\n\n(raw numbers), `units.csv`\n\n(per-PR data).\n\nBy default the audit covers the **newest 500 PRs** — a safety cap so a naive run on a huge repo stays fast and bounded. When it truncates, the run prints a notice telling you how to raise it. Two optional flags control the window (both newest-first):\n\n```\ncommensa-audit --repo owner/name --since 2026-03-14 --max-prs 150\n```\n\n`--since YYYY-MM-DD`\n\n— only PRs created on/after this UTC date`--max-prs N`\n\n— cap to the N newest PRs (default`500`\n\n; use`--max-prs 0`\n\nfor no cap)\n\nBoth early-stop pagination, so `--max-prs 150`\n\ncosts ~150 PRs' worth of API calls, not the repo's entire history. Run with no flags on a repo under 500 PRs and you get everything, exactly as before.\n\n**Read-only.** GET requests only; a token with read scope is sufficient.**Local-first.** Everything runs and stays on your machine. No telemetry, no phone-home, nothing leaves your network.**Inspectable.** Pure Python, stdlib +`requests`\n\n+`jinja2`\n\n. Read every line before you run it.\n\nEvery PR is classified by a transparent signal cascade — explicit corrective titles/reverts → self-correction (a PR predominantly undoing lines added in the prior N days) → churn-cluster membership → otherwise generative. Every classification in the output carries the signal that fired and a human-readable why. Thresholds live in one config block; tune them and re-run offline with `--reuse`\n\n.\n\nKnown limits (also printed in the report footer): classification is heuristic; squash merges blur attribution; survival windows mean young repos read optimistic; agent-marked share is a lower bound — absence of a marker is not evidence of human authorship. We grade our own certainty rather than fake precision — that's the whole point of the project.\n\nAgent-era teams measure activity — PRs merged, lines shipped, velocity. None of that distinguishes progress from cleanup. The rework tax does: it's the share of motion that was correction, the closest git-only proxy for \"how well was this work directed?\" It won't tell you everything (cost-per-outcome needs token data git doesn't have — that's [what we're building next](https://commensa.ai)) — but it's the most honest first number, and it's free.\n\nThis tool is a snapshot. [Commensa](https://commensa.ai) is the trendline: continuous rework measurement by team and module, alerts, monthly executive reports — and the cost side git can't see, captured at the agent harness. First 25 companies: founding-partner terms.\n\n**measure the durable work, not the noise.**\n\nMIT licensed.", "url": "https://wpnews.pro/news/i-audited-162-agent-written-prs-27-were-the-ai-fixing-itself", "canonical_source": "https://github.com/commensa-ai/commensa-audit", "published_at": "2026-06-13 15:01:46+00:00", "updated_at": "2026-06-13 15:17:26.673464+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-tools", "ai-products", "mlops"], "entities": ["Commensa", "GitHub", "commensa-audit"], "alternates": {"html": "https://wpnews.pro/news/i-audited-162-agent-written-prs-27-were-the-ai-fixing-itself", "markdown": "https://wpnews.pro/news/i-audited-162-agent-written-prs-27-were-the-ai-fixing-itself.md", "text": "https://wpnews.pro/news/i-audited-162-agent-written-prs-27-were-the-ai-fixing-itself.txt", "jsonld": "https://wpnews.pro/news/i-audited-162-agent-written-prs-27-were-the-ai-fixing-itself.jsonld"}}