cd /news/machine-learning/balancing-linguistic-intelligibility… · home topics machine-learning article
[ARTICLE · art-45380] src=aclanthology.org ↗ pub= topic=machine-learning verified=true sentiment=· neutral

Balancing Linguistic Intelligibility and Speaker Identity in Zero-Shot Cross-Lingual Voice Cloning

Researchers evaluated four state-of-the-art zero-shot cross-lingual voice cloning systems across six languages, finding that Arabic remains particularly challenging. The study, presented at IWSLT 2026, analyzed tradeoffs between speech accuracy and speaker identity preservation in autoregressive and diffusion-based architectures.

read1 min views13 publishedJun 30, 2026
Balancing Linguistic Intelligibility and Speaker Identity in Zero-Shot Cross-Lingual Voice Cloning
Image: Aclanthology (auto-discovered)
Abstract

Cross-lingual voice cloning (CLVC) aims to synthesize speech in a target language while preserving the vocal identity of a source speaker who has no recorded speech in that language. Despite recent advances in multilingual text-to-speech systems, zero-shot CLVC remains challenging due to phonetic divergence across languages and the difficulty of maintaining speaker identity alongside linguistic intelligibility. In this work, we present a systematic evaluation of four state-of-the-art CLVC systems spanning autoregressive and diffusion-based architectures. Using English source speakers from the ACL-60/60 dataset, we evaluate zero-shot voice transfer across multiple target languages, including Arabic, Chinese, French, German, Russian, and Japanese. Systems are assessed using speaker similarity and content consistency metrics under a unified multilingual evaluation pipeline. We analyze how different modeling approaches autoregressive language modeling and diffusion-based flow matching handle the tradeoff between speech accuracy and speaker identity preservation across different architectural approaches. We further observe substantial performance variation across languages, with Arabic remaining particularly challenging under zero-shot transfer settings.- Anthology ID:

- 2026.iwslt-1.12
- Volume:
[Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)](/volumes/2026.iwslt-1/)- Month:
  • July
  • Year:
  • 2026
- Address:
- San Diego, USA (in-person and online)
- Editors:

Elizabeth Salesky,Antonios Anastasopoulos,Matteo Negri,Marcello Federico- Venues:

[IWSLT](/venues/iwslt/)|[WS](/venues/ws/)- SIG:
[SIGSLT](/sigs/sigslt/)- Publisher:
  • Association for Computational Linguistics
- Note:
- Pages:
  • 103–110
- Language:
- URL:
[https://aclanthology.org/2026.iwslt-1.12/](https://aclanthology.org/2026.iwslt-1.12/)- DOI:
- Cite (ACL):
[Balancing Linguistic Intelligibility and Speaker Identity in Zero-Shot Cross-Lingual Voice Cloning](https://aclanthology.org/2026.iwslt-1.12/)(Ahtasam et al., IWSLT 2026)- PDF:
[https://aclanthology.org/2026.iwslt-1.12.pdf](https://aclanthology.org/2026.iwslt-1.12.pdf)
── more in #machine-learning 4 stories · sorted by recency
── more on @acl-60/60 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/balancing-linguistic…] indexed:0 read:1min 2026-06-30 ·