{"slug": "llm-reviewers-are-useful-but-some-pr-checks-should-stay-deterministic", "title": "LLM reviewers are useful, but some PR checks should stay deterministic", "summary": "A developer proposes splitting code review for AI-generated pull requests into judgment (LLM reviewers) and evidence (deterministic checks). The developer argues that certain checks, such as verifying file boundaries, workflow privilege changes, agent behavior file edits, and test evidence, should be deterministic and repeatable before merge. The approach aims to catch suspicious changes like instruction drift or tool mistakes in AI-generated PRs.", "body_md": "AI coding agents are getting better at opening pull requests.\n\nThat changes the review problem.\n\nA normal review asks whether the code looks correct, whether the design makes sense, and whether the edge cases were considered.\n\nThose questions still matter.\n\nBut an AI-generated pull request also raises a different kind of question:\n\nDid the agent change something outside the intended task, and is there enough repeatable evidence to merge?\n\nI have started thinking about this as a split between **judgment** and **evidence**.\n\nLLM reviewers help with judgment. Agent Gate verifies deterministic merge evidence.\n\nI do not think every review question should become a hard CI gate. Some parts of code review need human context. Some parts benefit from an LLM noticing suspicious patterns. But a few checks are mechanical enough that I want them to be deterministic, repeatable, and visible before merge.\n\nThis is the checklist I currently use when thinking about AI-generated PRs.\n\nThe first question is not whether the code is good.\n\nIt is whether the PR changed the files it was supposed to change.\n\nFor a human PR, an unrelated edit may be easy to explain in review. For an agent PR, unrelated edits are more suspicious because they may reflect an instruction drift, a tool mistake, or a broad refactor the maintainer did not ask for.\n\nA simple contract can help:\n\n```\nThis PR is allowed to touch:\n- src/auth/**\n- tests/auth/**\n```\n\nThen the review can ask a deterministic question:\n\n```\nDid the PR touch anything outside those paths?\n```\n\nThat does not prove the code is correct. It only proves the PR stayed inside its declared boundary.\n\nThat boundary still matters.\n\nGitHub Actions workflows are one of the highest-risk places for an agent to edit.\n\nA small source change and a workflow permission change do not have the same risk profile.\n\nFor example, I would want a very visible warning if a PR adds or changes this:\n\n```\npermissions:\n  contents: write\n```\n\nor starts using `secrets.*`\n\nin a new workflow path.\n\nThis is not a semantic code review problem. It is a policy boundary problem.\n\nThe question is deterministic:\n\n```\nDid this PR increase workflow privileges or introduce a dangerous workflow pattern?\n```\n\nThat kind of check should not depend on whether an LLM happened to notice it in a comment.\n\nAI coding agents often depend on files that shape future behavior:\n\n```\nAGENTS.md\nCLAUDE.md\n.github/copilot-instructions.md\n.cursor/rules/**\n.mcp.json\n```\n\nA change to these files can affect future agent runs, tool access, or repo-specific instructions.\n\nThat makes them different from normal documentation changes.\n\nIf an AI-generated PR edits `.mcp.json`\n\nor `AGENTS.md`\n\n, I want that surfaced clearly before merge, even if the source code diff looks harmless.\n\nThe deterministic question is:\n\n```\nDid the PR change files that control future agent behavior?\n```\n\nThis is especially important for teams adopting coding agents across repositories, because the control plane can drift quietly.\n\nTest evidence is tricky.\n\nA changed test file does not prove the behavior is correct. It does not prove the test is meaningful. It does not prove coverage.\n\nBut for risky areas, the absence of any matching test change is still useful evidence.\n\nIf a PR changes auth logic, payment handling, session middleware, or a migration, I want to know whether the PR also changed a related test file.\n\nThe check should be phrased carefully:\n\n```\nThere is no matching test-file evidence.\n```\n\nNot:\n\n```\nThis PR is untested.\n```\n\nThat distinction matters. Deterministic checks should say exactly what they know, and no more.\n\nThis is not always the first rule I would add, but it is one I keep coming back to.\n\nPackage manifests and lockfiles can hide meaningful risk:\n\n```\npackage.json\npnpm-lock.yaml\nyarn.lock\npackage-lock.json\n```\n\nSome changes are normal dependency maintenance. Others deserve more attention:\n\n```\n{\n  \"scripts\": {\n    \"postinstall\": \"node scripts/setup.js\"\n  }\n}\n```\n\nFor AI-generated PRs, I would want to know:\n\n```\nDid a lifecycle script appear?\nDid an existing package script change?\nDid dependencies change without an expected lockfile change?\n```\n\nAgain, not every finding should block. But these changes should be easy to see.\n\nThis is where deterministic evidence meets human ownership.\n\nA PR can stay in scope, avoid workflow escalation, and include test evidence, but still need the right reviewer.\n\nExamples:\n\n``` php\nsrc/auth/** changed -> security reviewer expected\n.github/workflows/** changed -> platform reviewer expected\n.mcp.json changed -> maintainer/platform approval expected\n```\n\nI do not think this should always block by default, especially for solo maintainers. But for teams, reviewer evidence may be one of the most useful signals.\n\nThe question is:\n\n```\nDid the right human approve the risky part of this PR?\n```\n\nThat is not a replacement for review. It is a way to make ownership visible.\n\nA lot.\n\nI would not want deterministic CI to answer questions like:\n\n```\nIs this design good?\nIs this abstraction worth it?\nWill users understand this behavior?\nIs this bug fix actually correct?\n```\n\nThose are judgment questions.\n\nLLM reviewers can help with judgment. Human reviewers own judgment. Deterministic gates should focus on evidence that can be checked the same way every time.\n\nThe best candidates are checks that are:\n\nFor me, that currently includes:\n\nThese checks do not make an AI-generated PR safe.\n\nThey make the risk easier to inspect before merge.\n\nThe safest adoption path is not to block everything on day one.\n\nI would start with warnings:\n\n```\nmode: warn\nfail-on-block: false\n```\n\nThen observe real PRs.\n\nWhich findings are useful?\n\nWhich ones are noisy?\n\nWhich ones would you trust as merge gates?\n\nOnly after that would I promote low-noise findings to blocking checks.\n\nI think AI-generated PR review will need both judgment and evidence.\n\nLLM reviewers can help with judgment.\n\nDeterministic CI checks should verify merge evidence.\n\nI’m exploring this idea in Agent Gate, a small GitHub Action for deterministic merge evidence in AI-generated PRs.\n\nDisclosure: I used AI assistance to help draft and edit this article, and I reviewed the technical claims before publishing.", "url": "https://wpnews.pro/news/llm-reviewers-are-useful-but-some-pr-checks-should-stay-deterministic", "canonical_source": "https://dev.to/sjh9714/llm-reviewers-are-useful-but-some-pr-checks-should-stay-deterministic-4k1e", "published_at": "2026-06-16 20:17:27+00:00", "updated_at": "2026-06-16 20:47:23.640331+00:00", "lang": "en", "topics": ["artificial-intelligence", "developer-tools", "ai-agents", "ai-safety"], "entities": ["GitHub Actions", "LLM", "Agent Gate"], "alternates": {"html": "https://wpnews.pro/news/llm-reviewers-are-useful-but-some-pr-checks-should-stay-deterministic", "markdown": "https://wpnews.pro/news/llm-reviewers-are-useful-but-some-pr-checks-should-stay-deterministic.md", "text": "https://wpnews.pro/news/llm-reviewers-are-useful-but-some-pr-checks-should-stay-deterministic.txt", "jsonld": "https://wpnews.pro/news/llm-reviewers-are-useful-but-some-pr-checks-should-stay-deterministic.jsonld"}}