cd /news/machine-learning/show-hn-classify-mechanical-faults-u… Β· home β€Ί topics β€Ί machine-learning β€Ί article
[ARTICLE Β· art-47715] src=github.com β†— pub= topic=machine-learning verified=true sentiment=Β· neutral

Show HN: Classify mechanical faults using Contrastive Language-Audio Pretraining

A developer released cardiag, an open-source audio-ML pipeline that uses Contrastive Language-Audio Pretraining (CLAP) to classify mechanical faults from phone recordings. The tool achieves 0.79 AUROC for fault detection and provides calibrated triage, returning 'uncertain' when confidence is low. The project is available as a CLI and web app.

read3 min views1 publishedJul 1, 2026
Show HN: Classify mechanical faults using Contrastive Language-Audio Pretraining
Image: source

cardiag

is an end-to-end audio-ML pipeline. It scrapes fault-sound clips from YouTube/TikTok, cleans the audio (isolating the mechanical sound from speech, music, and noise), embeds it with a frozen CLAP model, and trains small linear heads to triage the fault. It is exposed as a CLI and a live web app.

cardiag-demo.mp4 #

This is a proof of concept, and honest about what that means. Diagnosing a car fault from a phone recording is genuinely hard, so cardiag

is built as a calibrated triage aid rather than a diagnoser: it tells you whether something sounds wrong, roughly where in the car it is, and a ranked shortlist of likely parts. When the audio won't support a call, it says "uncertain" instead of bluffing.

The real contribution is the cleaning + honest-training recipe, which is reusable on other audio datasets. The modest accuracy here reflects how hard the problem is from crude phone audio (we hit the literature ceiling); the

samemethod reaches 0.93 AUROC on clean engine audio. See[docs/DEFENSE.md].

Two pages visualize the first two stages of the pipeline:

Isolating the engine audioβ€” an interactive look at theclean()

cascade pulling a short mechanical span out of noisy YouTube audio (speech, music, road noise).CLAP, visualizedβ€” how the frozen CLAP model turns those spans into the 512-d embedding the linear heads classify.

Measured out-of-sample, leakage-safe (by-video grouped CV over 1,031 video groups; permutation p = 0.0005). These are honest numbers, not a leaderboard.

Capability Result vs. chance
Is something wrong? (fault/normal) AUROC 0.79 [0.76, 0.83]
0.50
Where in the car? (6 zones) right zone in top-3 β‰ˆ 75%
2Γ—
Which part? (12+ families) right part in top-3 β‰ˆ 45–65%
3–4Γ—
Knows when it doesn't know calibrated (ECE β‰ˆ 0.04), returns UNCERTAIN
β€”

Full details, and the one head we demoted for failing out-of-sample (knock), are in docs/MODEL_CARD.md.

A fresh clone is immediately usable. A small pre-trained model ships in models/

, and a synthetic demo clip is bundled, so nothing needs to be downloaded or scraped.

git clone <this-repo> && cd car-diagnosis
uv venv && source .venv/bin/activate
uv pip install -e ".[scrape,web,dev,viz]"     # Python 3.11

cardiag doctor                 # preflight: what's installed
cardiag train --fixtures       # a working model offline in ~2s (no scrape, no 2 GB download)
cardiag diagnose <clip.wav>    # verdict + where-in-the-car + ranked parts
cardiag serve --model models   # live web app: drop a clip / paste a link, "explain why"

Verify the whole thing end-to-end in an isolated worktree: bash scripts/clone_verify.sh

.

audio ──► clean() cascade ──► CLAP embedding ──► linear heads ──► Diagnosis
          (isolate spans)     (frozen, 512-d)    (fault/region/    (calibrated,
                                                  part/knock)       UNCERTAIN-aware)

There is one segmentation path. Scraped clips, your own recordings (cardiag ingest

, any length), and uploads at inference all flow through the same clean()

cascade that isolates short mechanical spans. Spans over ~10 s are split into windows so CLAP never silently truncates them. Training and serving share one embedding contract, so there is no train/serve skew.

cardiag diagnose clip.wav            # full model: verdict + region + ranked parts
cardiag triage   clip.wav            # calibrated engine-vs-running-gear
cardiag clean    clip.wav            # isolate the mechanical sound (no model needed)
cardiag inspect  clip.wav -o r.html  # SEE/HEAR the pipeline: spans, spectrograms, scores
cardiag ingest   ./my_audio --kind fault --cause wheel_bearing   # bring your own audio
cardiag scrape   youtube|tiktok      # build a corpus (Reddit is deprecated β€” too noisy)
cardiag train                        # train on your corpus

Add --json

to any inference command for machine-readable output.

docs/DEFENSE.mdβ€” the honest case that a deliberately crude method earns a real triage result.docs/MODEL_CARD.mdβ€” per-head metrics, intended use, limitations.docs/architecture.mdβ€” pipeline diagrams.docs/scraping-guide.mdβ€” start-to-finish corpus building.

Valid for social-style / targeted-upload audio (YouTube, TikTok, or a phone clip a user records deliberately). It is not a safety-critical or standalone diagnostic. It is a triage assistant that narrows where to look and is honest about its uncertainty. Model files are joblib artifacts: load only ones you trust.

License: see LICENSE.

── more in #machine-learning 4 stories Β· sorted by recency
── more on @clap 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/show-hn-classify-mec…] indexed:0 read:3min 2026-07-01 Β· β€”