KIT’s Submission to Cross-Lingual Voice Cloning in IWSLT 2026 Researchers from KIT submitted a cross-lingual voice cloning system to the IWSLT 2026 track, building on FishAudio-S2-Pro with language tag prompting, reinforcement learning fine-tuning, and reference-conditioned lexical matching. Language prompting yielded the largest gains, while lexical matching improved pronunciation of domain-specific terms. The work addresses intelligibility and naturalness challenges in preserving speaker identity across languages. Abstract Cross-lingual voice cloning aims to generate speech in a target language while preserving speaker identity from a source-language reference. This task is central to speech translation and is the focus of the IWSLT 2026 Cross-Lingual Voice Cloning track. A key challenge is maintaining intelligibility and naturalness in the presence of accent variation and domain-specific vocabulary. We build on a multilingual text-to-speech model, FishAudio-S2-Pro, and introduce language tag prompting to improve language control and reduce accent leakage. We further apply reinforcement learning RL fine-tuning for task adaptation and observe improvements in intelligibility. Finally, we propose a reference-conditioned lexical matching method that improves pronunciation of domain-specific terms when lexical overlap is present. Results show that language prompting provides the largest gains, while lexical matching yields consistent improvements on matched subsets.- Anthology ID: - 2026.iwslt-1.8 - Volume: Proceedings of the 23rd International Conference on Spoken Language Translation IWSLT 2026 /volumes/2026.iwslt-1/ - Month: - July - Year: - 2026 - Address: - San Diego, USA in-person and online - Editors: Elizabeth Salesky /people/elizabeth-salesky/ , Antonios Anastasopoulos /people/antonios-anastasopoulos/ , Matteo Negri /people/matteo-negri/ , Marcello Federico /people/marcello-federico/ - Venues: IWSLT /venues/iwslt/ | WS /venues/ws/ - SIG: SIGSLT /sigs/sigslt/ - Publisher: - Association for Computational Linguistics - Note: - Pages: - 78–83 - Language: - URL: https://aclanthology.org/2026.iwslt-1.8/ https://aclanthology.org/2026.iwslt-1.8/ - DOI: - Cite ACL : - Seymanur Akti and Alexander Waibel. 2026. KIT’s Submission to Cross-Lingual Voice Cloning in IWSLT 2026 https://aclanthology.org/2026.iwslt-1.8/ . In Proceedings of the 23rd International Conference on Spoken Language Translation IWSLT 2026 , pages 78–83, San Diego, USA in-person and online . Association for Computational Linguistics. - Cite Informal : KIT’s Submission to Cross-Lingual Voice Cloning in IWSLT 2026 https://aclanthology.org/2026.iwslt-1.8/ Akti & Waibel, IWSLT 2026 - PDF: https://aclanthology.org/2026.iwslt-1.8.pdf https://aclanthology.org/2026.iwslt-1.8.pdf