# Show HN: Data-review, diff the data/numbers a PR changes

> Source: <https://github.com/10fra/data-review>
> Published: 2026-06-14 18:58:02+00:00

*Change-time review for data-engineering and financial-controlling work.*

An agent skill that reviews the **data impact** of a code change: it re-runs the
affected pipeline lanes and checks whether the numbers move the way the author
said they would.

Code review checks the code; tests check the assertions you remembered to write.
Neither runs a query, so locale mis-parses, sign flips, silent NULLs, join
fan-out, and double counting all pass green. **Scripts compute, agents judge:**
no agent does arithmetic a query can do, no script decides whether a number
*should* have moved.

```
you             →  a PR touching parsers / transforms / models / deliverables
blast_radius    →  which lanes the diff hits                    (deterministic)
T1  tieouts     →  is the world in a known-good state?          (deterministic)
T2  rerun+diff  →  scratch rebuild vs committed baseline        (deterministic)
T3  row diff    →  row-level EXCEPT ALL + top-k moved groups    (deterministic)
judgment layer  →  declared / explained / UNEXPLAINED           (agents)
report          →  findings ranked by € impact + coverage table
```

A real before/after data bug caught in under a minute, no agent or account:

```
git clone https://github.com/10fra/data-review.git
cd data-review && ./demo.sh
```

It builds a toy two-stage pipeline, blesses a baseline, then drops a
cents-to-dollars conversion in staging. A unit test and the orders-vs-staging
tie-out both stay green (everything scaled in lockstep); the baseline diff
catches `sum(amount)`

jumping $134.50 → $13,450 and escalates it.

```
git clone https://github.com/10fra/data-review.git
cd data-review && ./install.sh
```

`install.sh`

builds the venv, links the skill into `~/.claude/skills/`

, and runs
the tests; re-run after a `git pull`

. A consumer project carries a
`data-review.yml`

and committed baselines under `.data-review/baselines/`

.

In a project with a `data-review.yml`

, ask Claude Code:

review the data impact of this branch

Auto-discovered from `~/.claude/skills/`

, it runs the evidence scripts, then
reviews the diff through ten failure-class lenses (each seeded from a real
production bug), checks every delta against the PR's `## Expected data impact`

,
and verifies each finding adversarially before it reaches the report.

Deterministic, so runnable by hand or in CI. The `data-review`

launcher uses the
repo's venv from any consumer directory:

```
data-review blast --manifest data-review.yml --diff main...HEAD   # diff × manifest → tier plan
data-review bless --manifest data-review.yml --lane <name>        # snapshot live state → committed baseline
data-review t1    --manifest data-review.yml                      # tieouts + live-vs-baseline conformance
data-review t2    --manifest data-review.yml --lane <name>        # scratch re-run, metric diff vs baseline
data-review t3    --manifest data-review.yml --lane <name> --scratch <db>  # row-level EXCEPT ALL
data-review xlsx  --manifest data-review.yml --workbook <path>    # Excel tie-outs, hardcodes, short SUMs
```

(`alias dr=/path/to/data-review/data-review`

to shorten.)

One `data-review.yml`

per project, declaring what the skill cannot infer:

```
engine:                 # where the numbers live (DuckDB; adapters convert anything else)
lanes:                  # unit of blast radius: code globs + scratch-safe run command + output tables
baselines:              # metrics + group-by dims snapshotted into committed JSON
tieouts:                # cross-artifact invariants, verified on live data before being committed
deliverables:           # Excel workbooks audited against SQL
context:                # prose priors for the judgment agents (locale, sign conventions, known traps)
```

A lane run command must take a scratch-target override (`scratch_env`

) or the
skill refuses it, so a re-run never touches the canonical store. It must build
what it consumes, or it re-runs stale code; non-DuckDB outputs convert there too.

**Baselines are committed JSON.** Re-blessing is a reviewed git change, so baseline drift shows up in diffs.**Tiers escalate mechanically.** Any beyond-tolerance T2 delta makes the lane a T3 candidate;`money_critical`

lanes always get T3. Whether the movement is*explained*is the agents' call.**Findings are falsifiable.** Each carries a one-sentence claim, € impact from evidence, a reproduce command, and its verification votes.**Coverage is reported, not implied.** The report ends with a lane × tier table; an empty report is not a clean review.

Tested on four stacks (one private, three public), replaying **real historical
bugs** from their git history.

| Project | Stack | Bug replayed / injected | Caught | Signal |
|---|---|---|---|---|
| consumer #1 (private) | TS parsers + DuckDB | supplier A amount explosion (real) | T2 | sum €63.5M → €3.91B, 74 group deltas, zero leakage |
| consumer #1 (private) | TS parsers + DuckDB | supplier B US-locale dates (real) | T2 | null_rate(accounting_period) +16pp, 96 group deltas |
|

[owid/etl](https://github.com/owid/etl)`ef7a53c9b`

, real)`accidents`

[pudl](https://github.com/catalyst-cooperative/pudl)`09d8efa7d`

, real)Every clean-control rerun produced **zero** beyond-tolerance facts, deterministic
to the cent, and one caught real drift on its own (a canonical store stale against
merged parser fixes). The recurring lesson: **tie-outs and pipeline tests are
blind to scaling, double-counting, and drop distortions.** Shares renormalize,
identities scale in lockstep, sets dedupe. Only the committed-baseline diff is
anchored outside the change.

- Scripts emit
**facts, never findings**. Severity is the judgment layer's job. - Exit 2 (
`evidence-unavailable`

) is**never a pass**. Missing input is a finding, not a skip. - Silence never looks green: unmatched files, skipped formulas, dropped metrics, and unavailable tiers are all reported facts.
- A stale scratch database is never diffed; pre-existing scratch files are deleted before every re-run.
- A tieout is only committed after it has been verified to hold on live data.
- € impact is computed from evidence by script, never estimated by an agent.

- Write
`data-review.yml`

(schema:`skills/data-review/manifest.schema.json`

). `data-review bless --manifest data-review.yml --lane <name>`

, commit the baselines.`data-review t1 --manifest data-review.yml`

; a clean run means ready.

Cost: 3 minutes (dbt + DuckDB) to half a day (dagster monorepo).
