{"slug": "dots-tts-2b-parameter-continuous-end-to-end-autoregressive-tts-system", "title": "Dots.tts: 2B-parameter continuous, end-to-end autoregressive TTS system", "summary": "A 2-billion-parameter fully continuous, end-to-end autoregressive text-to-speech system called dots.tts has achieved state-of-the-art performance across multiple benchmarks, including the best average results on Seed-TTS-Eval with word error rates of 0.94% and 1.30% on Chinese and English test sets. The system, which pairs a semantic encoder, LLM, and autoregressive flow-matching acoustic head over a 48 kHz AudioVAE without discrete tokens, also attained the highest average speaker similarity of 83.9 on the 24-language MiniMax multilingual benchmark. This marks a significant advancement in open-source TTS technology, demonstrating strong generation stability, voice cloning ability, and emotional expressiveness.", "body_md": "# dots.tts\n\nA 2B-parameter fully continuous, end-to-end autoregressive text-to-speech system.\n\n**Abstract**\ndots.tts is a **2B-parameter fully continuous**, end-to-end\nautoregressive (AR) text-to-speech system. The backbone pairs a **semantic encoder**,\nan **LLM**, and an **autoregressive flow-matching acoustic head** over\na 48 kHz **AudioVAE**, with no discrete tokens anywhere in the pipeline.\n\ndots.tts achieves the **best average performance** on Seed-TTS-Eval,\nwith WERs of **0.94% / 1.30% / 6.60%** and SIM scores of **81.0 / 77.1 / 79.5**\non the zh / en / zh-hard test sets, respectively. It further attains the highest average speaker similarity\n(**83.9**) on the 24-language MiniMax multilingual benchmark. Across other benchmarks,\ndots.tts also consistently demonstrates **open-source state-of-the-art**\nperformance, exhibiting strong generation stability, voice cloning ability, and emotional expressiveness.\n\n**Contents**", "url": "https://wpnews.pro/news/dots-tts-2b-parameter-continuous-end-to-end-autoregressive-tts-system", "canonical_source": "https://rednote-hilab.github.io/dots.tts-demo/", "published_at": "2026-06-06 04:53:53+00:00", "updated_at": "2026-06-06 05:16:18.820798+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "generative-ai", "natural-language-processing"], "entities": ["dots.tts", "Seed-TTS-Eval", "MiniMax"], "alternates": {"html": "https://wpnews.pro/news/dots-tts-2b-parameter-continuous-end-to-end-autoregressive-tts-system", "markdown": "https://wpnews.pro/news/dots-tts-2b-parameter-continuous-end-to-end-autoregressive-tts-system.md", "text": "https://wpnews.pro/news/dots-tts-2b-parameter-continuous-end-to-end-autoregressive-tts-system.txt", "jsonld": "https://wpnews.pro/news/dots-tts-2b-parameter-continuous-end-to-end-autoregressive-tts-system.jsonld"}}