{"slug": "kits-submission-to-cross-lingual-voice-cloning-in-iwslt-2026", "title": "KIT’s Submission to Cross-Lingual Voice Cloning in IWSLT 2026", "summary": "Researchers from KIT submitted a cross-lingual voice cloning system to the IWSLT 2026 track, building on FishAudio-S2-Pro with language tag prompting, reinforcement learning fine-tuning, and reference-conditioned lexical matching. Language prompting yielded the largest gains, while lexical matching improved pronunciation of domain-specific terms. The work addresses intelligibility and naturalness challenges in preserving speaker identity across languages.", "body_md": "##### Abstract\n\nCross-lingual voice cloning aims to generate speech in a target language while preserving speaker identity from a source-language reference. This task is central to speech translation and is the focus of the IWSLT 2026 Cross-Lingual Voice Cloning track. A key challenge is maintaining intelligibility and naturalness in the presence of accent variation and domain-specific vocabulary. We build on a multilingual text-to-speech model, FishAudio-S2-Pro, and introduce language tag prompting to improve language control and reduce accent leakage. We further apply reinforcement learning (RL) fine-tuning for task adaptation and observe improvements in intelligibility. Finally, we propose a reference-conditioned lexical matching method that improves pronunciation of domain-specific terms when lexical overlap is present. Results show that language prompting provides the largest gains, while lexical matching yields consistent improvements on matched subsets.- Anthology ID:\n- 2026.iwslt-1.8\n- Volume:\n[Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)](/volumes/2026.iwslt-1/)- Month:\n- July\n- Year:\n- 2026\n- Address:\n- San Diego, USA (in-person and online)\n- Editors:\n[Elizabeth Salesky](/people/elizabeth-salesky/),[Antonios Anastasopoulos](/people/antonios-anastasopoulos/),[Matteo Negri](/people/matteo-negri/),[Marcello Federico](/people/marcello-federico/)- Venues:\n[IWSLT](/venues/iwslt/)|[WS](/venues/ws/)- SIG:\n[SIGSLT](/sigs/sigslt/)- Publisher:\n- Association for Computational Linguistics\n- Note:\n- Pages:\n- 78–83\n- Language:\n- URL:\n[https://aclanthology.org/2026.iwslt-1.8/](https://aclanthology.org/2026.iwslt-1.8/)- DOI:\n- Cite (ACL):\n- Seymanur Akti and Alexander Waibel. 2026.\n[KIT’s Submission to Cross-Lingual Voice Cloning in IWSLT 2026](https://aclanthology.org/2026.iwslt-1.8/). In*Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)*, pages 78–83, San Diego, USA (in-person and online). Association for Computational Linguistics. - Cite (Informal):\n[KIT’s Submission to Cross-Lingual Voice Cloning in IWSLT 2026](https://aclanthology.org/2026.iwslt-1.8/)(Akti & Waibel, IWSLT 2026)- PDF:\n[https://aclanthology.org/2026.iwslt-1.8.pdf](https://aclanthology.org/2026.iwslt-1.8.pdf)", "url": "https://wpnews.pro/news/kits-submission-to-cross-lingual-voice-cloning-in-iwslt-2026", "canonical_source": "https://aclanthology.org/2026.iwslt-1.8/", "published_at": "2026-06-30 00:00:00+00:00", "updated_at": "2026-06-30 18:52:53.222186+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "natural-language-processing", "ai-research", "generative-ai"], "entities": ["KIT", "FishAudio-S2-Pro", "IWSLT 2026", "Seymanur Akti", "Alexander Waibel", "Association for Computational Linguistics"], "alternates": {"html": "https://wpnews.pro/news/kits-submission-to-cross-lingual-voice-cloning-in-iwslt-2026", "markdown": "https://wpnews.pro/news/kits-submission-to-cross-lingual-voice-cloning-in-iwslt-2026.md", "text": "https://wpnews.pro/news/kits-submission-to-cross-lingual-voice-cloning-in-iwslt-2026.txt", "jsonld": "https://wpnews.pro/news/kits-submission-to-cross-lingual-voice-cloning-in-iwslt-2026.jsonld"}}