Show HN: Avera – a deterministic check that proves no regression was introduced A developer released Avera, an open-source tool that deterministically detects introduced regressions by comparing baseline and current test runs, blocking releases only when a previously passing test fails. The tool provides a tamper-evident evidence trail and supports multiple safety-critical domains like automotive and medical, aiming to prevent regressions from slipping through green CI pipelines, especially with AI-generated code. A deterministic regression gate for code changes. Green CI proves nothing failed — AVERA proves nothing regressed . AVERA compares a baseline test run against the current one and blocks a release only when there is proof of an introduced regression— a test that passed before and fails now — with a tamper-evident evidence trail behind the verdict. Local-first, deterministic,no LLM in the decision. Install from source AVERA is not yet on PyPI , then point it at two JUnit files — verdict + gate out, no project setup, no requirements file: git clone https://github.com/tc7kxsszs5-cloud/avera && cd avera pip install -e . avera check --baseline main.xml --current pr.xml AVERA Check Verdict: confirmed regression Introduced failures 1 : pkg.tests.test thing Gate general.v1 : block exit 1 — fails the CI step Works with anything that emits JUnit / xUnit XML pytest, jest, go test, JUnit, … . Add --json for machines; the exit code drops into any pipeline. AVERA ships a public blind-replay benchmark of real regressions. AVERA is given only the before/after test results — no hint where the bug is — and must catch it. php AVERA PY=python3 ./benchmark/reproduce.sh PASS toolz-f0831e7 - confirmed regression / block That case is commit f0831e7 in the real pytoolz/toolz https://github.com/pytoolz/toolz library later reverted in PR 551 . Given only the two result sets, AVERA independently identified the introduced failure test isiterable , pass→fail , ruled confirmed regression , and returned gate=block under every domain policy. See — and add your own case. /tc7kxsszs5-cloud/avera/blob/main/benchmark benchmark/ A passing CI run only proves no expressed test failed — not that nothing regressed. When prod breaks after a green merge, there is no machine-checkable record of what regressed or why the merge was allowed ; teams reconstruct it by hand after an incident. With AI agents now generating PRs faster than anyone can review them, "the suite was green" and "that test is just flaky" are exactly how genuine pass→fail regressions slip through. AVERA gives the reviewer a deterministic separator — proven introduced regression vs everything else — and a tamper-evident trail behind every gate decision. Stated plainly, because overclaiming is the failure mode this project avoids: - It does not catch a regression that no test exercises — that needs fault-injection / mutation analysis, not the gate. - It does not decide flaky vs real — that stays a human call. - It does not decide your release — it produces auditable evidence; a human signs off. No LLM in the decision path. - It is not a certified/qualified tool. Its output is designed to be independently re-checkable by a human inspectable manifest, hash-chained audit, re-derivable integrity root . The same deterministic engine, calibrated per domain via policy-as-data. Verdict assignment is a proven-total decision table /tc7kxsszs5-cloud/avera/blob/main/docs/AVERA VERDICT SPECIFICATION.md . | Domain | Standard | Status | |---|---|---| | Software / CI / DevOps | plain pass/fail CI, AI-PR triage | ✅ | | Automotive ADAS, BMS | ISO 26262 | ✅ | | Aviation avionics | DO-178C | ✅ | | Railway signaling, control | CENELEC EN 50128 | ✅ | | Medical devices | IEC 62304 / ISO 14971 | ✅ | | Space / flight software | NASA NPR 7150.2 / NASA-STD-8739.8 | ✅ | Pick a policy with --policy