cardiag
is an end-to-end audio-ML pipeline. It scrapes fault-sound clips from YouTube/TikTok, cleans the audio (isolating the mechanical sound from speech, music, and noise), embeds it with a frozen CLAP model, and trains small linear heads to triage the fault. It is exposed as a CLI and a live web app.
cardiag-demo.mp4 #
This is a proof of concept, and honest about what that means. Diagnosing a car fault
from a phone recording is genuinely hard, so cardiag
is built as a calibrated triage aid rather than a diagnoser: it tells you whether something sounds wrong, roughly where in the car it is, and a ranked shortlist of likely parts. When the audio won't support a call, it says "uncertain" instead of bluffing.
The real contribution is the cleaning + honest-training recipe, which is reusable on other audio datasets. The modest accuracy here reflects how hard the problem is from crude phone audio (we hit the literature ceiling); the
samemethod reaches 0.93 AUROC on clean engine audio. See[docs/DEFENSE.md].
Two pages visualize the first two stages of the pipeline:
Isolating the engine audioβ an interactive look at theclean()
cascade pulling a short mechanical span out of noisy YouTube audio (speech, music, road noise).CLAP, visualizedβ how the frozen CLAP model turns those spans into the 512-d embedding the linear heads classify.
Measured out-of-sample, leakage-safe (by-video grouped CV over 1,031 video groups; permutation p = 0.0005). These are honest numbers, not a leaderboard.
| Capability | Result | vs. chance |
|---|---|---|
| Is something wrong? (fault/normal) | AUROC 0.79 [0.76, 0.83] | |
| 0.50 | ||
| Where in the car? (6 zones) | right zone in top-3 β 75% | |
| 2Γ | ||
| Which part? (12+ families) | right part in top-3 β 45β65% | |
| 3β4Γ | ||
| Knows when it doesn't know | calibrated (ECE β 0.04), returns UNCERTAIN |
|
| β |
Full details, and the one head we demoted for failing out-of-sample (knock), are in docs/MODEL_CARD.md.
A fresh clone is immediately usable. A small pre-trained model ships in models/
, and a synthetic demo clip is bundled, so nothing needs to be downloaded or scraped.
git clone <this-repo> && cd car-diagnosis
uv venv && source .venv/bin/activate
uv pip install -e ".[scrape,web,dev,viz]" # Python 3.11
cardiag doctor # preflight: what's installed
cardiag train --fixtures # a working model offline in ~2s (no scrape, no 2 GB download)
cardiag diagnose <clip.wav> # verdict + where-in-the-car + ranked parts
cardiag serve --model models # live web app: drop a clip / paste a link, "explain why"
Verify the whole thing end-to-end in an isolated worktree: bash scripts/clone_verify.sh
.
audio βββΊ clean() cascade βββΊ CLAP embedding βββΊ linear heads βββΊ Diagnosis
(isolate spans) (frozen, 512-d) (fault/region/ (calibrated,
part/knock) UNCERTAIN-aware)
There is one segmentation path. Scraped clips, your own recordings (cardiag ingest
, any length), and uploads at inference all flow through the same clean()
cascade that isolates short mechanical spans. Spans over ~10 s are split into windows so CLAP never silently truncates them. Training and serving share one embedding contract, so there is no train/serve skew.
cardiag diagnose clip.wav # full model: verdict + region + ranked parts
cardiag triage clip.wav # calibrated engine-vs-running-gear
cardiag clean clip.wav # isolate the mechanical sound (no model needed)
cardiag inspect clip.wav -o r.html # SEE/HEAR the pipeline: spans, spectrograms, scores
cardiag ingest ./my_audio --kind fault --cause wheel_bearing # bring your own audio
cardiag scrape youtube|tiktok # build a corpus (Reddit is deprecated β too noisy)
cardiag train # train on your corpus
Add --json
to any inference command for machine-readable output.
docs/DEFENSE.mdβ the honest case that a deliberately crude method earns a real triage result.docs/MODEL_CARD.mdβ per-head metrics, intended use, limitations.docs/architecture.mdβ pipeline diagrams.docs/scraping-guide.mdβ start-to-finish corpus building.
Valid for social-style / targeted-upload audio (YouTube, TikTok, or a phone clip a user records deliberately). It is not a safety-critical or standalone diagnostic. It is a triage assistant that narrows where to look and is honest about its uncertainty. Model files are joblib artifacts: load only ones you trust.
License: see LICENSE.