I built an open source VAD that beats Silero, Pyannote, and WebRTC

wpnews.pro

cd /news/machine-learning/i-built-an-open-source-vad-that-beat… · home › topics › machine-learning › article

[ARTICLE · art-37139] src=github.com ↗ pub=2026-06-24T03:06Z topic=machine-learning verified=true sentiment=↑ positive

I built an open source VAD that beats Silero, Pyannote, and WebRTC

Developer Monish released NOVA-VAD, an open-source voice activity detector that achieves 93% accuracy on noisy audio, outperforming Silero, Pyannote, and WebRTC while remaining lightweight and explainable. The tool uses an ensemble classifier with 150+ features and provides confidence scores and decision explanations, addressing a long-standing trade-off in speech processing.

read3 min views5 publishedJun 24, 2026

I built an open source VAD that beats Silero, Pyannote, and WebRTC — Image: source

Noise-robust, Optimized, eXplainable Voice Activity Detector

NOVA-VAD is a lightweight, explainable Voice Activity Detector that outperforms every major open-source alternative on real-world noisy audio — without requiring a GPU or PyTorch.

Built as an open-source contribution to solving a problem that has existed in speech processing for 15+ years: existing VADs are either accurate OR lightweight OR explainable. Never all three.

Tested on 100 held-out files from UrbanSound8K (traffic, sirens, jackhammers, AC units, construction noise):

Model	Accuracy	Precision	Recall	F1	Lightweight	Explainable
WebRTC VAD	58.0%	57.69%	60.0%	58.82%	✅	❌
Pyannote VAD	62.0%	57.32%	94.0%	71.21%	❌	❌
Silero VAD	87.0%	86.27%	88.0%	87.13%	❌	❌
NOVA-VAD
93.0%
97.78%
88.0%
92.63%
✅
✅

Feature	WebRTC	Silero	Pyannote	NOVA-VAD
Accurate on noisy audio	❌	Partial	Partial	✅
Lightweight (no PyTorch)	✅	❌	❌	✅
Fully open source	✅	Partial	✅	✅
Explains every decision	❌	❌	❌	✅
Retrainable on custom data	❌	❌	❌	✅
Confidence scores	❌	❌	❌	✅

Raw Audio → Denoiser → 150+ Features → Ensemble Classifier → SPEECH / NO SPEECH + Explanation

MFCCs + deltas (78 features) — spectral shape and change over time
Zero Crossing Rate — speech is more consistent than noise
RMS Energy pattern — speech rises and falls rhythmically
Spectral Flux — speech transitions smoothly, noise changes randomly
Harmonic/Percussive ratio — human voice is mostly harmonic
Tempo/rhythm — speech has syllable rhythm noise does not
Mel Spectrogram statistics — energy distribution across frequency bands
Silence ratio — proportion of frames below energy threshold

Random Forest + Gradient Boosting voting together.

Every prediction includes confidence score + top 10 features that drove the decision in plain English.

git clone https://github.com/monishmal3375/nova-vad.git
cd nova-vad
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python3 download_data.py
python3 -m src.pipeline
python3 -m src.explainer data/clean_speech/speech_001.wav
python3 -m src.benchmark

NOVA-VAD EXPLANATION File: speech_001.wav

Prediction: SPEECH

Confidence: 93.47% Why this decision was made:

MFCC Delta 1 std (10.63%) → HIGH spectral change rate — dynamic audio like speech MFCC Delta 2 std ( 6.14%) → HIGH acceleration — rapidly changing audio, speech-like Silence ratio ( 5.92%) → 56% silence — mix of speech and s Spectral centroid std ( 4.27%) → HIGH variation — shifting frequency center Mel mean ( 3.50%) → MODERATE energy — normal speech level

nova-vad/

├── data/

│ ├── speech/ # raw speech files

│ ├── noise/ # raw noise files

│ ├── clean_speech/ # denoised speech

│ └── clean_noise/ # denoised noise

├── src/

│ ├── denoiser.py # noise reduction pipeline

│ ├── vad.py # WebRTC VAD baseline

│ ├── classifier.py # NOVA-VAD 150+ features + ensemble

│ ├── explainer.py # explainability layer

│ ├── benchmark.py # head-to-head comparison

│ └── pipeline.py # end-to-end runner

├── models/ # saved trained models

├── download_data.py # automated dataset down

Existing VADs fail in three ways:

They break in noisy environments — WebRTC gets 58% on real-world noise
They are black boxes — no explanation of why a decision was made
They are too heavy for edge devices — Silero needs PyTorch (200MB+)

NOVA-VAD solves all three simultaneously. No existing open-source tool does this.

Denoiser pipeline
WebRTC VAD baseline
150+ feature MFCC classifier
Ensemble model (RF + GBT)
Explainability layer
Benchmark vs Silero, Pyannote, WebRTC
Real-time streaming audio support
pip install nova-vad packaging
Research paper

Monish

MIT License — free to use, modify, and distribute.

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-built-an-open-source-v…

Read original on github.com → github.com/monishmal3375/nova-vad

mentioned entities

Monish

NOVA-VAD

Silero

Pyannote

WebRTC

UrbanSound8K

metadata

slugi-built-an-open-source-vad-that-beats-silero-pyannote-and-webrtc

topic#machine-learning

secondary1 topics

sentimentpositive

canonicalgithub.com

navigation

← prevHow I Built a WhatsApp AI Bot in…

next →Ions, a distributed reasoning gr…

── more in #machine-learning 4 stories · sorted by recency

github.com · 23 Jun · #machine-learning

Show HN: peerd – AI agent harness that runs entirely in your browser

livekit.com · 21 Jun · #machine-learning

Why WebRTC beats WebSockets for realtime voice AI

dev.to · 19 Jun · #machine-learning

I built a real-time multiplayer 3D IDE with WebRTC voice chat and AI generation from scratch 🚀

hackster.io · 18 Jun · #machine-learning

Stream Edge AI Vision to Any Browser via WebRTC

── more on @monish 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 24 Jun · #ai-policy

An AI startup is suing the US government for taking away Anthropic's new model

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required