Voicebox: The Open-Source AI Voice Studio That Just Hit 28K Stars

wpnews.pro

cd /news/ai-tools/voicebox-the-open-source-ai-voice-st… · home › topics › ai-tools › article

[ARTICLE · art-14278] src=dev.to ↗ pub=2026-05-26T09:45Z topic=ai-tools verified=true sentiment=↑ positive

Voicebox: The Open-Source AI Voice Studio That Just Hit 28K Stars

Voicebox, an open-source AI voice studio with 28,500 GitHub stars and an MIT license, runs entirely on local hardware and combines voice cloning, dictation, and text-to-speech across 23 languages. The project ships seven TTS engines, including Qwen3-TTS for multilingual cloning and Kokoro for CPU-only operation, along with a built-in MCP server that lets AI agents like Claude Code and Cursor speak through cloned voices with customizable personalities. Voicebox also includes a global hotkey for local dictation using Whisper-based speech-to-text, with support for Apple Silicon, NVIDIA, AMD, and Intel Arc hardware.

read3 min views11 publishedMay 26, 2026

I've been watching the voice AI space for a while. ElevenLabs does voice cloning incredibly well. WisprFlow nails voice dictation. But both live in the cloud, both cost money every month, and both require up your voice data to someone else's server.

That's why Voicebox caught my attention. 28.5k GitHub stars, MIT license, and it runs entirely on your machine. It combines what ElevenLabs does (voice output) with what WisprFlow does (voice input), ties them together with a local LLM, and wraps everything in a polished desktop app.

The voice cloning takes seconds of reference audio. Upload a short clip, and Voicebox builds a voice model that sounds like you. It covers 23 languages — English, Chinese, Japanese, Arabic, Hindi, Swahili, and more.

Under the hood, Voicebox ships with 7 TTS engines:

Engine	Best For
Qwen3-TTS
High-quality multilingual cloning, natural-language delivery instructions
Chatterbox Turbo
Emotion tags (`[laugh]` , `[sigh]` , `[gasp]` ) for expressive speech
LuxTTS
Lightweight (~1GB VRAM), 48kHz, 150x realtime on CPU
Kokoro
82M model, 50 curated preset voices, runs on CPU
TADA
HumeAI speech-language model, 700s+ coherent audio
Qwen CustomVoice
Delivery control without reference audio
Chatterbox Multilingual
23 languages, broadest coverage

If you don't want to clone anything, there are 50+ preset voices ready to go. And after generating audio, you get a full effects panel — reverb, delay, compression, pitch shift, chorus — all powered by Spotify's Pedalboard library, with real-time preview.

This is the feature that made me actually excited.

Voicebox ships a built-in MCP (Model Context Protocol) server. Any MCP-compatible agent — Claude Code, Cursor, Cline, Windsurf — can call it to speak. Setup takes one command:

claude mcp add voicebox \
  --transport http \
  --url http://127.0.0.1:17493/mcp \
  --header "X-Voicebox-Client-Id: claude-code"

After that, your agent can speak through your cloned voice. "Tests passed, ready to merge" — in a voice you chose.

You can assign different voices to different agents. Hear one voice for your code reviewer, another for your deployment bot. And the real kicker: voice personalities. Attach a persona description like "calm engineer" or "sarcastic code reviewer," and Voicebox's local LLM rewrites the agent's output to match that personality before synthesizing speech. Your agents don't just sound different — they talk differently.

Voicebox includes a global hotkey for dictation. Hold it, speak, release — text pastes into whatever text field you're focused on. On macOS, it uses the accessibility API for precise paste injection without touching your clipboard.

All dictation stays local. Whisper-based STT runs on your machine. An optional LLM refinement pass cleans up ums and stutters.

Hardware	Backend
Apple Silicon	MLX (Metal, 4-5x speed)
NVIDIA GPU	CUDA
AMD GPU	ROCm
Intel Arc	IPEX/XPU
CPU only	Kokoro 82M works fine

The app ships as a DMG for macOS and MSI for Windows. First launch auto-downloads the model weights you need — Kokoro is 82MB, Qwen3-TTS a few GB. REST API and MCP server both listen on localhost:17493

, with docs at http://127.0.0.1:17493/docs

Voice I/O going local was always going to happen. Cloud convenience is real, but voice data is biometric data — losing it is closer to losing your fingerprint than losing your email. The fact that open-source TTS and STT models are now good enough to run on consumer hardware changes the equation.

Voicebox isn't just a useful tool. It's a proof point that agents don't have to be silent text boxes. They can speak, emote, and have personality — all without sending your voice to a data center.

source & further reading

dev.to — original article Breaking the Abstraction Tax: Mastering Custom C++ Operations for High-Performance Edge AI on Android We built a free AI face shape detector with Claude Vision and Vercel Lesson 1 - TDD with AI: getting tests that hold up when the agent writes them

~/api · this article 200

$curl api.wpnews.pro/v1/news/voicebox-the-open-source…

Read original on dev.to → dev.to/hiroki-ii-ai/voicebox-the-open-source-ai-…

mentioned entities

Voicebox

ElevenLabs

WisprFlow

Qwen3-TTS

Chatterbox Turbo

LuxTTS

Kokoro

TADA

metadata

slugvoicebox-the-open-source-ai-voice-studio-that-just-hit-28k-stars

topic#ai-tools

secondary2 topics

sentimentpositive

canonicaldev.to

navigation

← prevIs WebSockets enough for AI chat…

next →Pluralistic: The AI bubble isn't…

── more in #ai-tools 4 stories · sorted by recency

github.com · 10 Jul · #ai-tools

Voicebox: The Open-Source AI Voice Studio

runtimewire.com · 10 Jul · #ai-tools

Melius wants agents, not chatbots, to run creative production

cryptobriefing.com · 10 Jul · #ai-tools

OpenAI’s Codex team hosts Reddit AMA on GPT-5.6 features as AI coding wars heat up

pub.towardsai.net · 10 Jul · #ai-tools

Anthropic Just Gave Wall Street 10 AI Employees — Here’s What They Actually Do, In Plain English

── more on @voicebox 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required