Higgs Audio v3 TTS 4B: Built for voice chat

wpnews.pro

cd /news/artificial-intelligence/higgs-audio-v3-tts-4b-built-for-voic… · home › topics › artificial-intelligence › article

[ARTICLE · art-22141] src=boson.ai ↗ pub=2026-06-05T03:38Z topic=artificial-intelligence verified=true sentiment=↑ positive

Higgs Audio v3 TTS 4B: Built for voice chat

Boson AI released Higgs Audio v3 TTS, a text-to-speech model designed for voice chat that generates expressive conversational speech across 100+ languages with zero-shot voice cloning and inline control over emotion, style, and prosody. The model achieves single-digit word error rates on multilingual benchmarks and outperforms competing systems including Fish Audio S2 Pro and Qwen3-TTS-1.7B in conversational behavior evaluations. The release addresses the need for TTS systems that can produce real-time, expressive speech for voice AI agents rather than simply reading text aloud.

read3 min views18 publishedJun 5, 2026

More articles Higgs Audio v3 TTS is built for voice chat: it speaks, not just reads. It turns model responses into expressive conversational speech across 100+ languages, with zero-shot voice cloning and inline control over emotion, style, prosody, s, and sound effects.

Voice AI needs a different kind of text-to-speech. In a live conversation, speech is not just the last step after text generation. It is how the agent answers, reacts, s, emphasizes, and carries the turn.

Higgs Audio v3 TTS is built for that setting: beyond reading, toward real speech for voice AI. It keeps the reliability of a production TTS system, but it is designed to speak model responses in the moment, with the timing and expression that make an agent feel conversational.

The model is directly controllable from the text stream. Inline tags can change emotion, style, speed, pitch, s, and sound effects mid-utterance, so developers can shape how a response is spoken without leaving the generation flow.

Out of the box, Higgs Audio v3 TTS reaches single-digit WER/CER on 100+ languages. Across

Seed-TTS,CV3,MiniMax-Multilingual, and Higgs-Multilingual, v3 sets the lowest WER againstHiggs Audio v2and a broad comparison set of open and commercial systems.#### Multilingual Benchmarks

We evaluate Higgs Audio v3 TTS on public multilingual TTS suites and our internal 111-language Higgs-Multilingual set, covering both common and lower-resource languages.

The table reports macro-averaged WER/CER (↓, x100). Lower is better; highlighted cells mark the best result per row. Non-Higgs bests are selected from

Fish Audio S2 Pro,Qwen3-TTS-1.7B,VibeVoice-7B,IndexTTS-2,MiMo-Audio-7B-Instruct,MOSS-TTS-v1.5,OmniVoice,ChatterBox, andFireRedTTS-2.| Benchmark | # langs | Higgs Audio v2 | Higgs Audio v3 | Non-Higgs Best Model | |---|---|---|---|---| |

[CV3](https://github.com/FunAudioLLM/CV3-Eval)[MiniMax-Multilingual](https://arxiv.org/abs/2505.07916)Higgs Audio v3's results have been reproduced by

[SGLang-Omni team.](https://www.lmsys.org/blog/2026-06-04-higgs-audio-v3-tts/)#### Conversational Behavior Benchmarks

Emergent TTSevaluates conversational behaviors that are hard to capture with transcript accuracy alone, including emotion, foreign words, paralinguistic cues, complex pronunciation, questions, and syntactic complexity.

Win-rate (↑) per category, measured as judge preference versus the baseline row. For a fair comparison, every model shares the same reference audio per prompt, and we run the benchmark text verbatim with no inline control tags inserted.

Category	Higgs Audio v3	Fish Audio S2 Pro	Qwen3-TTS-1.7B	IndexTTS-2	MOSS-TTS-v1.5	OmniVoice
Overall ↑	53.65%	43.80%	38.84%	31.12%	43.89%	40.82%
Emotions ↑	53.75%	53.04%	45.54%	39.29%	60.54%	61.07%
Foreign Words ↑	48.75%	33.93%	24.64%	5.36%	35.18%	28.75%
Paraling- uistics ↑	68.57%	53.75%	44.29%	42.50%	51.43%	52.68%
Complex Pronunciation ↑	25.10%	18.16%	30.00%	12.45%	11.63%	13.67%
Questions ↑	61.43%	55.00%	53.39%	45.89%	53.21%	45.00%
Syntactic Complexity ↑	60.71%	45.71%	34.11%	38.93%	47.32%	40.36%

Try It

The fastest way to hear Higgs Audio v3 TTS is in

Boson Workspace. Choose a voice, enter a response, and experiment with emotion, prosody, s, and style tags directly in the browser.We’re currently stabilizing the playground. Thanks for your patience.

When you are ready to build, use the

Boson API. The endpoint supports blocking and streaming generation, voice cloning from audio references, and the same inline controls used in the workspace.For local inference, the model weights are available on

Hugging Face. You can serve them withSGLang-Omniusing the Higgs TTS cookbook.##### Acknowledgments

Contributors: Silin Meng, Ke Bai, Ruskin Raj Manku, Huapeng Zhou, Jonah Mackey, Dongming Shen, Erik Li, Weisu Yin, Yizhi Liu, Xinyu Wang, Alex Chen, Jaewon Lee, Lindsey Allen, Mu Li

Special thanks to the broader Boson team for supporting training and evaluation, SGLang-Omni team for optimizing inference, and MKTech for development support.

source & further reading

boson.ai — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/higgs-audio-v3-tts-4b-bu…

Read original on boson.ai → www.boson.ai/blog/higgs-audio-v3-tts

mentioned entities

Higgs Audio

Seed-TTS

CV3

MiniMax-Multilingual

Higgs-Multilingual

BytedanceSpeech

FunAudioLLM

bosonai

metadata

slughiggs-audio-v3-tts-4b-built-for-voice-chat

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalboson.ai

navigation

← prevAI API gateway fallback policy t…

next →agentgateway Joins AAIF as an Op…

── more in #artificial-intelligence 4 stories · sorted by recency

voi.id · 22 Jul · #artificial-intelligence

Tingkatkan Asisten Perjalanan, Google Maps Uji Fitur Pintar Lintas Aplikasi

startupfortune.com · 22 Jul · #artificial-intelligence

Curative CEO canceled a $600,000 Salesforce contract after vibecoding a replacement CRM in two months

github.com · 22 Jul · #artificial-intelligence

SynnoDB – Synthesizing Database engines for your workloads

byteiota.com · 22 Jul · #artificial-intelligence

NVIDIA Cosmos 3 Edge: On-Device Robot AI for Developers

── more on @higgs audio 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required