# LLM reviewers are useful, but some PR checks should stay deterministic

> Source: <https://dev.to/sjh9714/llm-reviewers-are-useful-but-some-pr-checks-should-stay-deterministic-4k1e>
> Published: 2026-06-16 20:17:27+00:00

AI coding agents are getting better at opening pull requests.

That changes the review problem.

A normal review asks whether the code looks correct, whether the design makes sense, and whether the edge cases were considered.

Those questions still matter.

But an AI-generated pull request also raises a different kind of question:

Did the agent change something outside the intended task, and is there enough repeatable evidence to merge?

I have started thinking about this as a split between **judgment** and **evidence**.

LLM reviewers help with judgment. Agent Gate verifies deterministic merge evidence.

I do not think every review question should become a hard CI gate. Some parts of code review need human context. Some parts benefit from an LLM noticing suspicious patterns. But a few checks are mechanical enough that I want them to be deterministic, repeatable, and visible before merge.

This is the checklist I currently use when thinking about AI-generated PRs.

The first question is not whether the code is good.

It is whether the PR changed the files it was supposed to change.

For a human PR, an unrelated edit may be easy to explain in review. For an agent PR, unrelated edits are more suspicious because they may reflect an instruction drift, a tool mistake, or a broad refactor the maintainer did not ask for.

A simple contract can help:

```
This PR is allowed to touch:
- src/auth/**
- tests/auth/**
```

Then the review can ask a deterministic question:

```
Did the PR touch anything outside those paths?
```

That does not prove the code is correct. It only proves the PR stayed inside its declared boundary.

That boundary still matters.

GitHub Actions workflows are one of the highest-risk places for an agent to edit.

A small source change and a workflow permission change do not have the same risk profile.

For example, I would want a very visible warning if a PR adds or changes this:

```
permissions:
  contents: write
```

or starts using `secrets.*`

in a new workflow path.

This is not a semantic code review problem. It is a policy boundary problem.

The question is deterministic:

```
Did this PR increase workflow privileges or introduce a dangerous workflow pattern?
```

That kind of check should not depend on whether an LLM happened to notice it in a comment.

AI coding agents often depend on files that shape future behavior:

```
AGENTS.md
CLAUDE.md
.github/copilot-instructions.md
.cursor/rules/**
.mcp.json
```

A change to these files can affect future agent runs, tool access, or repo-specific instructions.

That makes them different from normal documentation changes.

If an AI-generated PR edits `.mcp.json`

or `AGENTS.md`

, I want that surfaced clearly before merge, even if the source code diff looks harmless.

The deterministic question is:

```
Did the PR change files that control future agent behavior?
```

This is especially important for teams adopting coding agents across repositories, because the control plane can drift quietly.

Test evidence is tricky.

A changed test file does not prove the behavior is correct. It does not prove the test is meaningful. It does not prove coverage.

But for risky areas, the absence of any matching test change is still useful evidence.

If a PR changes auth logic, payment handling, session middleware, or a migration, I want to know whether the PR also changed a related test file.

The check should be phrased carefully:

```
There is no matching test-file evidence.
```

Not:

```
This PR is untested.
```

That distinction matters. Deterministic checks should say exactly what they know, and no more.

This is not always the first rule I would add, but it is one I keep coming back to.

Package manifests and lockfiles can hide meaningful risk:

```
package.json
pnpm-lock.yaml
yarn.lock
package-lock.json
```

Some changes are normal dependency maintenance. Others deserve more attention:

```
{
  "scripts": {
    "postinstall": "node scripts/setup.js"
  }
}
```

For AI-generated PRs, I would want to know:

```
Did a lifecycle script appear?
Did an existing package script change?
Did dependencies change without an expected lockfile change?
```

Again, not every finding should block. But these changes should be easy to see.

This is where deterministic evidence meets human ownership.

A PR can stay in scope, avoid workflow escalation, and include test evidence, but still need the right reviewer.

Examples:

``` php
src/auth/** changed -> security reviewer expected
.github/workflows/** changed -> platform reviewer expected
.mcp.json changed -> maintainer/platform approval expected
```

I do not think this should always block by default, especially for solo maintainers. But for teams, reviewer evidence may be one of the most useful signals.

The question is:

```
Did the right human approve the risky part of this PR?
```

That is not a replacement for review. It is a way to make ownership visible.

A lot.

I would not want deterministic CI to answer questions like:

```
Is this design good?
Is this abstraction worth it?
Will users understand this behavior?
Is this bug fix actually correct?
```

Those are judgment questions.

LLM reviewers can help with judgment. Human reviewers own judgment. Deterministic gates should focus on evidence that can be checked the same way every time.

The best candidates are checks that are:

For me, that currently includes:

These checks do not make an AI-generated PR safe.

They make the risk easier to inspect before merge.

The safest adoption path is not to block everything on day one.

I would start with warnings:

```
mode: warn
fail-on-block: false
```

Then observe real PRs.

Which findings are useful?

Which ones are noisy?

Which ones would you trust as merge gates?

Only after that would I promote low-noise findings to blocking checks.

I think AI-generated PR review will need both judgment and evidence.

LLM reviewers can help with judgment.

Deterministic CI checks should verify merge evidence.

I’m exploring this idea in Agent Gate, a small GitHub Action for deterministic merge evidence in AI-generated PRs.

Disclosure: I used AI assistance to help draft and edit this article, and I reviewed the technical claims before publishing.
