Show HN: Avera – a deterministic check that proves no regression was introduced

A developer released Avera, an open-source tool that deterministically detects introduced regressions by comparing baseline and current test runs, blocking releases only when a previously passing test fails. The tool provides a tamper-evident evidence trail and supports multiple safety-critical domains like automotive and medical, aiming to prevent regressions from slipping through green CI pipelines, especially with AI-generated code.

A deterministic regression gate for code changes. Green CI proves nothing failed — AVERA proves nothing regressed . AVERA compares a baseline test run against the current one and blocks a release only when there is proof of an introduced regression— a test that passed before and fails now — with a tamper-evident evidence trail behind the verdict. Local-first, deterministic,no LLM in the decision. Install from source AVERA is not yet on PyPI , then point it at two JUnit files — verdict + gate out, no project setup, no requirements file: git clone https://github.com/tc7kxsszs5-cloud/avera && cd avera pip install -e . avera check --baseline main.xml --current pr.xml AVERA Check Verdict: confirmed regression Introduced failures 1 : pkg.tests.test thing Gate general.v1 : block exit 1 — fails the CI step Works with anything that emits JUnit / xUnit XML pytest, jest, go test, JUnit, … . Add --json for machines; the exit code drops into any pipeline. AVERA ships a public blind-replay benchmark of real regressions. AVERA is given only the before/after test results — no hint where the bug is — and must catch it. php AVERA PY=python3 ./benchmark/reproduce.sh PASS toolz-f0831e7 - confirmed regression / block That case is commit f0831e7 in the real pytoolz/toolz https://github.com/pytoolz/toolz library later reverted in PR 551 . Given only the two result sets, AVERA independently identified the introduced failure test isiterable , pass→fail , ruled confirmed regression , and returned gate=block under every domain policy. See — and add your own case. /tc7kxsszs5-cloud/avera/blob/main/benchmark benchmark/ A passing CI run only proves no expressed test failed — not that nothing regressed. When prod breaks after a green merge, there is no machine-checkable record of what regressed or why the merge was allowed ; teams reconstruct it by hand after an incident. With AI agents now generating PRs faster than anyone can review them, "the suite was green" and "that test is just flaky" are exactly how genuine pass→fail regressions slip through. AVERA gives the reviewer a deterministic separator — proven introduced regression vs everything else — and a tamper-evident trail behind every gate decision. Stated plainly, because overclaiming is the failure mode this project avoids: - It does not catch a regression that no test exercises — that needs fault-injection / mutation analysis, not the gate. - It does not decide flaky vs real — that stays a human call. - It does not decide your release — it produces auditable evidence; a human signs off. No LLM in the decision path. - It is not a certified/qualified tool. Its output is designed to be independently re-checkable by a human inspectable manifest, hash-chained audit, re-derivable integrity root . The same deterministic engine, calibrated per domain via policy-as-data. Verdict assignment is a proven-total decision table /tc7kxsszs5-cloud/avera/blob/main/docs/AVERA VERDICT SPECIFICATION.md . | Domain | Standard | Status | |---|---|---| | Software / CI / DevOps | plain pass/fail CI, AI-PR triage | ✅ | | Automotive ADAS, BMS | ISO 26262 | ✅ | | Aviation avionics | DO-178C | ✅ | | Railway signaling, control | CENELEC EN 50128 | ✅ | | Medical devices | IEC 62304 / ISO 14971 | ✅ | | Space / flight software | NASA NPR 7150.2 / NASA-STD-8739.8 | ✅ | Pick a policy with --policy <name general , automotive , aviation , railway , medical , space , ai agent . Zero-config check — avera check two JUnit files → verdict + gate , for plain pass/fail CI. Regression triage — baseline vs current comparison; fail-closed classification unknown status → treated as failure, never hidden . Deterministic gate — policy-as-data per domain; same inputs → same verdict → same evidence root, on any machine. Evidence manifest — content-addressed integrity root binding the whole artifact set. Immutable audit log — SHA-256 hash-chained, with an optional keyed HMAC tamper-evident mode. Sign-off — bound to the manifest root; fails closed if verification is skipped. Requirement coverage proof — traceable from change → test → requirement regulated domains . REST API & GitHub Action — for CI/CD integration see below . src/avera/ ├── adapters/ — artifact format adapters JUnit, CSV, simulation, logs, CANoe ├── compare/ — baseline vs current comparison fail-closed status taxonomy ├── classify/ — regression classification + proven-total verdict spec ├── gates/ — deterministic gate, policy-as-data policies/ .json ├── evidence/ — content-addressed evidence manifest integrity root ├── audit/ — hash-chained SHA-256 audit log optional keyed HMAC ├── signoff/ — sign-off state machine bound to the manifest root ├── domains/ — per-domain profiles avionics, powertrain, space, … ├── mutation/ — fault-injection / mutation-based confidence lens └── api/ — FastAPI REST endpoint benchmark/ — public blind-replay regression benchmark reproduce.sh fixtures/ — reference scenarios across domains docs/ — verdict spec, hardening report, dev principles, GTM tests/ — unit + cross-domain fixtures + exhaustive verdict-spec proof git clone https://github.com/tc7kxsszs5-cloud/avera cd avera pip install -e ". demo " Run the live demo shell ./start demo.sh → http://localhost:8501 Or analyze a full evidence pack avera analyze --project fixtures/bms-fast-charge --out reports Or try the hosted demo preview — no install: 👉 https://avera-production.up.railway.app https://avera-production.up.railway.app Read-only preview of the Streamlit shell — not full self-service. AVERA ships as a reusable GitHub Action, in two modes. Zero-config — gate plain pass/fail CI with two JUnit files, no evidence pack: .github/workflows/avera-verify.yml name: AVERA on: pull request jobs: verify: runs-on: ubuntu-latest steps: - uses: tc7kxsszs5-cloud/avera@v1 with: baseline: main-junit.xml known-good results e.g. from main current: pr-junit.xml this PR's results policy: general or space / automotive / aviation / … The job fails when the gate blocks a confirmed regression . Full evidence pack — the canonical artifact set for regulated review: - uses: actions/checkout@v4 - uses: tc7kxsszs5-cloud/avera@v1 with: project path: evidence/my-change fail on release blocking: 'true' Inputs: project path required , output path , policy , fail on release blocking , fail on regression , expected verdict . Outputs: verdict , risk , confidence , gate status , report path , manifest path , integrity root , audit log path . Examples: examples/github-action-usage.yml /tc7kxsszs5-cloud/avera/blob/main/examples/github-action-usage.yml , . /tc7kxsszs5-cloud/avera/blob/main/examples/github-action-minimal.yml examples/github-action-minimal.yml Served with uvicorn avera api.main:app . uvicorn avera api.main:app --host 0.0.0.0 --port 8000 Full canonical artifact set + deterministic gate status + integrity root curl -X POST http://localhost:8000/evidence-pack \ -H "Content-Type: application/json" \ -d '{"project": "fixtures/bms-fast-charge", "policy": "automotive"}' /evidence-pack returns verdict , risk , confidence , the deterministic gate status , the evidence-manifest integrity root , a decision summary, and the on-disk paths of every canonical artifact. docker pull ghcr.io/tc7kxsszs5-cloud/avera-cli:latest docker run --rm \ -v "$PWD/fixtures/bms-fast-charge:/workspace" \ -v "$PWD/reports:/reports" \ ghcr.io/tc7kxsszs5-cloud/avera-cli:latest \ analyze --project /workspace --out /reports --memory /reports/avera-memory.jsonl Multi-arch linux/amd64 , linux/arm64 . Pinned tags: latest , vX.Y.Z , sha-<short . Looking for engineering teams — running ordinary CI, or in automotive, aviation, railway, medical, or space — who want a narrow pilot with their own artifacts. The pilot is simple: one software change · one artifact family you already export · one 2-week review session. No infrastructure changes, no process disruption. 📩 Contact: mgaloyan79@gmail.com mailto:mgaloyan79@gmail.com · 🔗 Demo: avera-production.up.railway.app https://avera-production.up.railway.app Apache 2.0 — see LICENSE /tc7kxsszs5-cloud/avera/blob/main/LICENSE AVERA Engineering — engineering truth, preserved as evidence.