{"slug": "show-hn-data-review-diff-the-data-numbers-a-pr-changes", "title": "Show HN: Data-review, diff the data/numbers a PR changes", "summary": "A developer released Data-review, an open-source agent skill that automatically diffs data and numbers changed by a pull request, re-running affected pipeline lanes to catch bugs like sign flips, NULLs, and double counting that unit tests miss. The tool caught a real cents-to-dollars conversion error in under a minute during a demo, and is designed for data engineering and financial controlling workflows.", "body_md": "*Change-time review for data-engineering and financial-controlling work.*\n\nAn agent skill that reviews the **data impact** of a code change: it re-runs the\naffected pipeline lanes and checks whether the numbers move the way the author\nsaid they would.\n\nCode review checks the code; tests check the assertions you remembered to write.\nNeither runs a query, so locale mis-parses, sign flips, silent NULLs, join\nfan-out, and double counting all pass green. **Scripts compute, agents judge:**\nno agent does arithmetic a query can do, no script decides whether a number\n*should* have moved.\n\n```\nyou             →  a PR touching parsers / transforms / models / deliverables\nblast_radius    →  which lanes the diff hits                    (deterministic)\nT1  tieouts     →  is the world in a known-good state?          (deterministic)\nT2  rerun+diff  →  scratch rebuild vs committed baseline        (deterministic)\nT3  row diff    →  row-level EXCEPT ALL + top-k moved groups    (deterministic)\njudgment layer  →  declared / explained / UNEXPLAINED           (agents)\nreport          →  findings ranked by € impact + coverage table\n```\n\nA real before/after data bug caught in under a minute, no agent or account:\n\n```\ngit clone https://github.com/10fra/data-review.git\ncd data-review && ./demo.sh\n```\n\nIt builds a toy two-stage pipeline, blesses a baseline, then drops a\ncents-to-dollars conversion in staging. A unit test and the orders-vs-staging\ntie-out both stay green (everything scaled in lockstep); the baseline diff\ncatches `sum(amount)`\n\njumping $134.50 → $13,450 and escalates it.\n\n```\ngit clone https://github.com/10fra/data-review.git\ncd data-review && ./install.sh\n```\n\n`install.sh`\n\nbuilds the venv, links the skill into `~/.claude/skills/`\n\n, and runs\nthe tests; re-run after a `git pull`\n\n. A consumer project carries a\n`data-review.yml`\n\nand committed baselines under `.data-review/baselines/`\n\n.\n\nIn a project with a `data-review.yml`\n\n, ask Claude Code:\n\nreview the data impact of this branch\n\nAuto-discovered from `~/.claude/skills/`\n\n, it runs the evidence scripts, then\nreviews the diff through ten failure-class lenses (each seeded from a real\nproduction bug), checks every delta against the PR's `## Expected data impact`\n\n,\nand verifies each finding adversarially before it reaches the report.\n\nDeterministic, so runnable by hand or in CI. The `data-review`\n\nlauncher uses the\nrepo's venv from any consumer directory:\n\n```\ndata-review blast --manifest data-review.yml --diff main...HEAD   # diff × manifest → tier plan\ndata-review bless --manifest data-review.yml --lane <name>        # snapshot live state → committed baseline\ndata-review t1    --manifest data-review.yml                      # tieouts + live-vs-baseline conformance\ndata-review t2    --manifest data-review.yml --lane <name>        # scratch re-run, metric diff vs baseline\ndata-review t3    --manifest data-review.yml --lane <name> --scratch <db>  # row-level EXCEPT ALL\ndata-review xlsx  --manifest data-review.yml --workbook <path>    # Excel tie-outs, hardcodes, short SUMs\n```\n\n(`alias dr=/path/to/data-review/data-review`\n\nto shorten.)\n\nOne `data-review.yml`\n\nper project, declaring what the skill cannot infer:\n\n```\nengine:                 # where the numbers live (DuckDB; adapters convert anything else)\nlanes:                  # unit of blast radius: code globs + scratch-safe run command + output tables\nbaselines:              # metrics + group-by dims snapshotted into committed JSON\ntieouts:                # cross-artifact invariants, verified on live data before being committed\ndeliverables:           # Excel workbooks audited against SQL\ncontext:                # prose priors for the judgment agents (locale, sign conventions, known traps)\n```\n\nA lane run command must take a scratch-target override (`scratch_env`\n\n) or the\nskill refuses it, so a re-run never touches the canonical store. It must build\nwhat it consumes, or it re-runs stale code; non-DuckDB outputs convert there too.\n\n**Baselines are committed JSON.** Re-blessing is a reviewed git change, so baseline drift shows up in diffs.**Tiers escalate mechanically.** Any beyond-tolerance T2 delta makes the lane a T3 candidate;`money_critical`\n\nlanes always get T3. Whether the movement is*explained*is the agents' call.**Findings are falsifiable.** Each carries a one-sentence claim, € impact from evidence, a reproduce command, and its verification votes.**Coverage is reported, not implied.** The report ends with a lane × tier table; an empty report is not a clean review.\n\nTested on four stacks (one private, three public), replaying **real historical\nbugs** from their git history.\n\n| Project | Stack | Bug replayed / injected | Caught | Signal |\n|---|---|---|---|---|\n| consumer #1 (private) | TS parsers + DuckDB | supplier A amount explosion (real) | T2 | sum €63.5M → €3.91B, 74 group deltas, zero leakage |\n| consumer #1 (private) | TS parsers + DuckDB | supplier B US-locale dates (real) | T2 | null_rate(accounting_period) +16pp, 96 group deltas |\n|\n\n[owid/etl](https://github.com/owid/etl)`ef7a53c9b`\n\n, real)`accidents`\n\n[pudl](https://github.com/catalyst-cooperative/pudl)`09d8efa7d`\n\n, real)Every clean-control rerun produced **zero** beyond-tolerance facts, deterministic\nto the cent, and one caught real drift on its own (a canonical store stale against\nmerged parser fixes). The recurring lesson: **tie-outs and pipeline tests are\nblind to scaling, double-counting, and drop distortions.** Shares renormalize,\nidentities scale in lockstep, sets dedupe. Only the committed-baseline diff is\nanchored outside the change.\n\n- Scripts emit\n**facts, never findings**. Severity is the judgment layer's job. - Exit 2 (\n`evidence-unavailable`\n\n) is**never a pass**. Missing input is a finding, not a skip. - Silence never looks green: unmatched files, skipped formulas, dropped metrics, and unavailable tiers are all reported facts.\n- A stale scratch database is never diffed; pre-existing scratch files are deleted before every re-run.\n- A tieout is only committed after it has been verified to hold on live data.\n- € impact is computed from evidence by script, never estimated by an agent.\n\n- Write\n`data-review.yml`\n\n(schema:`skills/data-review/manifest.schema.json`\n\n). `data-review bless --manifest data-review.yml --lane <name>`\n\n, commit the baselines.`data-review t1 --manifest data-review.yml`\n\n; a clean run means ready.\n\nCost: 3 minutes (dbt + DuckDB) to half a day (dagster monorepo).", "url": "https://wpnews.pro/news/show-hn-data-review-diff-the-data-numbers-a-pr-changes", "canonical_source": "https://github.com/10fra/data-review", "published_at": "2026-06-14 18:58:02+00:00", "updated_at": "2026-06-14 19:11:41.645827+00:00", "lang": "en", "topics": ["ai-tools", "developer-tools", "ai-agents"], "entities": ["Data-review", "Claude Code", "DuckDB", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/show-hn-data-review-diff-the-data-numbers-a-pr-changes", "markdown": "https://wpnews.pro/news/show-hn-data-review-diff-the-data-numbers-a-pr-changes.md", "text": "https://wpnews.pro/news/show-hn-data-review-diff-the-data-numbers-a-pr-changes.txt", "jsonld": "https://wpnews.pro/news/show-hn-data-review-diff-the-data-numbers-a-pr-changes.jsonld"}}