The open-text-to-speech (TTS) landscape has undergone a dramatic transformation between 2024 and mid-2026. What was once dominated by proprietary APIs from ElevenLabs, Google, and OpenAI now features a competitive ecosystem of open-weight models that rival commercial quality while offering substantially lower cost and greater deployment flexibility. This report provides a comprehensive comparison of the leading open realtime TTS models as of May 2026, with particular focus on three representative architectures: Kokoro-82M (the ultra-lightweight decoder-only model), Supertonic 3 (the ONNX-based flow-matching engine optimized for edge deployment), and Qwen3-TTS (the large-scale dual-track transformer family from Alibaba).
Open TTS Models: A Comprehensive 2026 Comparison of Kokoro, Supertonic 3, Qwen3-TTS, and the Broader Landscape
The open-text-to-speech (TTS) landscape has shifted dramatically between 2024 and mid-2026, with open-weight models now rivaling proprietary APIs from ElevenLabs, Google, and OpenAI in quality while offering lower costs and greater deployment flexibility. A May 2026 comparison of leading open realtime TTS models highlights three representative architectures: Kokoro-82M, an ultra-lightweight decoder-only model; Supertonic 3, an ONNX-based flow-matching engine optimized for edge deployment; and Qwen3-TTS, a large-scale dual-track transformer family from Alibaba.
Run your AI side-project on zahid.host
EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.