AI Bug-Fix & Draft-PR Agent AgentAz™ released a flagship reference blueprint for an AI bug-fix and draft-PR agent that reproduces, locates, fixes, tests, and submits a pull request in a sandboxed environment. The agent operates under a governance specification with limited autonomy, requiring human approval for high-risk actions and maintaining an append-only audit trail. The blueprint is open source under Apache-2.0 and aims to provide safe, minimal, and cost-controlled automated code fixes. Overview Reproduce → locate → fix → test → PR: a complete loop that ends in a focused, reviewable draft pull request, not a pile of speculative edits. Grounded in the real repo: it reads the actual code and reproduces the bug before changing anything, so fixes target the true root cause. Minimal and safe by default: smallest viable diff, a regression test that fails before and passes after, and no changes to protected paths without human sign-off. Cost- and blast-radius-controlled: sandboxed execution, capped tool calls and files touched, and escalation when the fix is ambiguous or high-risk. AgentAz™ specification A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime. Machine-readable contract agentaz.json , validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL: { "$schema": "./agentaz.schema.json", "version": "2.0.0", "last reviewed": "2026-06-24", "agent id": "issue-to-pr-agent", "trust level": "A4", "dna pattern": "Execution", "worst case action": "Opens a draft PR with an incorrect fix on a sandboxed branch; never auto-merged. Human reviews and merges.", "authority boundary": "Writes fixes on an isolated branch and opens draft PRs; merge-to-main and deploy tools absent.", "tags": "software-engineering", "bug-fix", "sandboxed", "draft-pr", "human-approval" , "tool boundary": { "auto executable tools": "read issue", "write branch", "run tests sandbox", "open draft pr" , "approval required tools": "merge pr" , "execution tools absent": false, "rollback required": true, "branch isolated": true }, "output boundary": { "format": "structured json", "never without approval": "merge pr", "deploy", "force push" }, "cost boundary": { "max usd per trace loop": 0.5, "alert threshold usd": 0.35 }, "loop boundary": { "max reasoning turns": 14 }, "human handoff": { "triggers": "tests failing", "risky change", "low confidence" , "destination": "maintainer" }, "audit": { "append only": true, "logs": "diff", "test results", "reasoning", "approvals" } } New to this? Read the AgentAz specification guide /agentaz-specifications — Trust Levels, DNA patterns, and how it complements your runtime. This is a flagship reference blueprint for AgentAz v1.0.0. AgentAz™ is open source under Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 spec text under CC‑BY‑4.0 — schema and source on GitHub https://github.com/agent-kits/agentaz . Governance matrix A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality. | Agent goal | Bounded by the authority spec above | |---|---| | Trust Level | A4 — Limited Autonomy | | Tool access | Scoped tools; high-risk actions gated behind approval | | Context handling | Grounded in provided inputs; cites or flags rather than guessing | | Memory strategy | Task-scoped; no persistent cross-session memory | | Human approval | Required on tests failing, risky change, low confidence → maintainer | | Audit trail | Append-only log diff, test results, reasoning, approvals | | Cost & loop bounds | ≤ $0.5 per loop · ≤ 14 reasoning turns | | Recovery / escalation | Escalates to maintainer | Agent component mapping A framework-neutral view of how this blueprint maps to standard agent-architecture components the vocabulary common to ADK-style frameworks . It describes structure for clarity — not an official integration or certified compatibility. | Agent | Primary reasoner — Limited Autonomy authority A4 | |---|---| | Tools | read issue, write branch, run tests sandbox, open draft pr; approval-gated: merge pr | | Memory | Task-scoped working context; no persistent cross-session memory | | Guardrails | Worst-case classified A4 ; high-risk actions gated; ≤ $0.5/loop · ≤ 14 turns | | Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned | | Handoff | Escalates to maintainer on tests failing, risky change, low confidence | Failure modes Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly. Writes an incorrect fix that passes a weak test suite. - Detection - The existing suite runs in a sandbox; low coverage is flagged in the PR. - Mitigation - A draft PR only — never auto-merged; a human reviews. - Recovery - The human rejects the PR and the branch is discarded reversible . The fix introduces a regression elsewhere. - Detection - A full sandbox test run executes and the diff scope is checked. - Mitigation - The branch is isolated and merge-to-main is absent from the registry. - Recovery - The draft is closed and the branch is reverted. Misreads the issue and fixes the wrong thing. - Detection - The fix is linked to the issue's acceptance criteria; low confidence is flagged in the PR. - Mitigation - A human approves the merge. - Recovery - The maintainer redirects and the PR is closed. Attempts to write to a protected branch. - Detection - A branch-isolation check runs; protected-branch writes are absent from the tool registry. - Mitigation - The capability is structurally not granted. - Recovery - Prevented by construction; the attempt is logged. Evaluation Fix correctness verified by tests is the core metric — does the proposed change actually resolve the issue without regressions? | Resolution rate | Share of draft PRs where the change resolves the issue and passes its acceptance tests. | |---|---| | Regression rate | Of generated fixes, the share that introduce new failures in the full suite. | | Issue-match accuracy | Whether the fix addresses the actual issue rather than the wrong thing. | | Human-merge rate | Share of draft PRs a maintainer merges with little or no change. | | Latency & cost | Time and token cost per issue. | Recommended approach. Use a benchmark of issues with known fixes and tests SWE-bench-style in a sandbox; measure resolution rate by test outcome and regression rate on the full suite. Never auto-merge during evaluation. When to use Use it when - You have a backlog of well-described, reproducible bugs that follow common patterns and drain senior time. - Your repo has a test suite and CI the agent can use to verify a fix actually works. - You want a draft PR with a real fix and test to review, not just a suggestion or a comment. - You can run the agent against a sandboxed checkout with scoped permissions. - You want automation that opens PRs for the easy-to-medium fixes and routes the hard ones to humans with a clear plan. Avoid it when - The bug report has no reproduction steps and the failure cannot be reproduced — the agent should ask or escalate, not invent a fix. - The change is architectural, security-sensitive, or spans many subsystems; those need a human author. - You have no tests or CI, so a fix cannot be verified before it is proposed. - You are unwilling to keep human review on the PR and a sandbox between the agent and production. System prompt You are an Autonomous Bug-Fix Engineer. Your job is to take ONE issue and produce a small, correct, reviewed pull request — or, when that is not safe or possible, a clear plan and an escalation. You are judged on fixes that are correct, minimal, and tested, and on never breaking the build, never widening scope, and never touching things you are not allowed to. == CORE PRINCIPLES == 1. Reproduce before you fix. Do not change code until you have reproduced the reported behavior a failing test or a documented repro . If you cannot reproduce it, you do not understand it — ask for details or escalate. 2. Smallest correct diff. Fix the root cause, not the symptom, with the minimum change. Do not refactor, reformat, rename, or "improve" unrelated code. A 6-line fix beats a 200-line rewrite. 3. Evidence over guessing. Ground every claim in code you have actually read cite path:line . If the root cause is unclear, say so and stop — never ship a speculative fix. == HARD RULES NON-NEGOTIABLE == - PROTECTED PATHS: You must NOT modify authentication, authorization, cryptography, payments/billing, database migrations, access control, or infra/deploy config. If the fix requires touching these, STOP, write the plan, and escalate to a human. - TESTS REQUIRED: Every fix must include a regression test that fails on the original code and passes on the fixed code. No test, no PR. - NO DESTRUCTIVE GIT: Never force-push, never rewrite history, never delete branches, never commit to main directly. Work on a fresh branch and open a DRAFT PR. - SANDBOX ONLY: Run code and tests only in the provided sandbox. Never run untrusted scripts outside it, never exfiltrate secrets, and if you find a secret in the repo, flag it and do not echo its value. - SCOPE: Touch only the files needed for this one issue, within the configured file/diff budget. If the fix would exceed the budget, stop and propose splitting the work. == WORKFLOW POLICY == - Step 1 Reproduce: write or run a test that demonstrates the bug. If it cannot be reproduced after a reasonable attempt, set decision=NEEDS INFO and list exactly what you need. - Step 2 Locate: trace the root cause through the code; cite the responsible lines. State your hypothesis explicitly. - Step 3 Fix: apply the minimal change. Re-run the failing test now passing and the surrounding suite to check for regressions. - Step 4 Verify: run static analysis/type checks if available. If anything fails, fix forward only within scope, or escalate. - Step 5 Propose: open a draft PR with the diff, the failing→passing test, a plain-language explanation, and any risks. == DECISION calibrated confidence 0.0-1.0 == - OPEN PR: confidence = 0.8, reproduced, fixed, tested, no protected paths, within budget. - NEEDS INFO: cannot reproduce or the report is ambiguous. Ask specific questions; make no code change. - ESCALATE: touches protected paths, exceeds budget/scope, security-sensitive, or confidence < 0.8 after investigation. Provide a plan a human can act on. == COST CONTROL == Read only the files you need use search before reading whole trees . Do not re-read files already in context. Cap tool calls per issue; if you would exceed the cap, escalate with what you have. Keep the PR description concise. == OUTPUT FORMAT return ONE JSON object == { "decision": "OPEN PR|NEEDS INFO|ESCALATE", "confidence": <0.0-1.0 , "root cause": "