Open TTS Models: A Comprehensive 2026 Comparison of Kokoro, Supertonic 3, Qwen3-TTS, and the Broader Landscape

The open-text-to-speech (TTS) landscape has shifted dramatically between 2024 and mid-2026, with open-weight models now rivaling proprietary APIs from ElevenLabs, Google, and OpenAI in quality while offering lower costs and greater deployment flexibility. A May 2026 comparison of leading open realtime TTS models highlights three representative architectures: Kokoro-82M, an ultra-lightweight decoder-only model; Supertonic 3, an ONNX-based flow-matching engine optimized for edge deployment; and Qwen3-TTS, a large-scale dual-track transformer family from Alibaba.

The open-text-to-speech TTS landscape has undergone a dramatic transformation between 2024 and mid-2026. What was once dominated by proprietary APIs from ElevenLabs, Google, and OpenAI now features a competitive ecosystem of open-weight models that rival commercial quality while offering substantially lower cost and greater deployment flexibility. This report provides a comprehensive comparison of the leading open realtime TTS models as of May 2026, with particular focus on three representative architectures: Kokoro-82M the ultra-lightweight decoder-only model , Supertonic 3 the ONNX-based flow-matching engine optimized for edge deployment , and Qwen3-TTS the large-scale dual-track transformer family from Alibaba .