{"slug": "cuhksz-simultaneous-speech-translation-system-for-iwslt-2026", "title": "CUHKSZ Simultaneous Speech Translation System for IWSLT 2026", "summary": "The CUHKSZ team submitted a simultaneous speech translation system to IWSLT 2026, built on Qwen3-Omni-30B-A3B with LoRA adaptation, achieving 40.5 BLEU for English→Chinese and 27.7 BLEU for English→German in the 0–2 s latency regime. The system internalizes a read/write policy via syntax-aware supervision and uses a lightweight streaming agent for low-latency execution.", "body_md": "##### Abstract\n\nWe present the CUHKSZ Team submission to the IWSLT 2026 Simultaneous Speech Translation evaluation, targeting the main and Extra Context tracks for English→Chinese, German on unsegmented speech. Our system is built upon Qwen3-Omni-30B-A3B, a natively aligned audio-text LLM. Under the Constrained condition, we apply LoRA adaptation exclusively to the LLM. Specifically, we construct syntax-aware, chunk-aligned supervision from existing ASR corpora, using Qwen3-30B-Instruct to synthesize target translations. This enables the model to internalize the simultaneous read/write policy by autonomously predicting <wait> tokens at semantically incomplete boundaries. With the policy internalized, execution is delegated to a lightweight streaming agent served via vLLM. This agent feeds audio in fixed chunks, manages a bounded dialogue history, and enforces strict emission controls to minimize computation-aware delay. For the sub-track, contextual priors are dynamically injected into the prompt. On the official dev set, our 0–2 s latency regime submissions achieve 40.5 BLEU (1.95 s) for En→Zh and 27.7 BLEU (1.72 s) for En→De. In the 2–4 s regime, performance scales to 42.1 BLEU (2.16 s) and 30.5 BLEU (2.29 s) respectively.- Anthology ID:\n- 2026.iwslt-1.13\n- Volume:\n[Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)](/volumes/2026.iwslt-1/)- Month:\n- July\n- Year:\n- 2026\n- Address:\n- San Diego, USA (in-person and online)\n- Editors:\n[Elizabeth Salesky](/people/elizabeth-salesky/),[Antonios Anastasopoulos](/people/antonios-anastasopoulos/),[Matteo Negri](/people/matteo-negri/),[Marcello Federico](/people/marcello-federico/)- Venues:\n[IWSLT](/venues/iwslt/)|[WS](/venues/ws/)- SIG:\n[SIGSLT](/sigs/sigslt/)- Publisher:\n- Association for Computational Linguistics\n- Note:\n- Pages:\n- 111–118\n- Language:\n- URL:\n[https://aclanthology.org/2026.iwslt-1.13/](https://aclanthology.org/2026.iwslt-1.13/)- DOI:\n- Cite (ACL):\n- Zeyu Yang and Satoshi Nakamura. 2026.\n[CUHKSZ Simultaneous Speech Translation System for IWSLT 2026](https://aclanthology.org/2026.iwslt-1.13/). In*Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)*, pages 111–118, San Diego, USA (in-person and online). Association for Computational Linguistics. - Cite (Informal):\n[CUHKSZ Simultaneous Speech Translation System for IWSLT 2026](https://aclanthology.org/2026.iwslt-1.13/)(Yang & Nakamura, IWSLT 2026)- PDF:\n[https://aclanthology.org/2026.iwslt-1.13.pdf](https://aclanthology.org/2026.iwslt-1.13.pdf)", "url": "https://wpnews.pro/news/cuhksz-simultaneous-speech-translation-system-for-iwslt-2026", "canonical_source": "https://aclanthology.org/2026.iwslt-1.13/", "published_at": "2026-06-30 00:00:00+00:00", "updated_at": "2026-07-01 10:09:48.681523+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "natural-language-processing", "ai-research"], "entities": ["CUHKSZ", "IWSLT", "Qwen3-Omni-30B-A3B", "LoRA", "vLLM", "Qwen3-30B-Instruct", "Zeyu Yang", "Satoshi Nakamura"], "alternates": {"html": "https://wpnews.pro/news/cuhksz-simultaneous-speech-translation-system-for-iwslt-2026", "markdown": "https://wpnews.pro/news/cuhksz-simultaneous-speech-translation-system-for-iwslt-2026.md", "text": "https://wpnews.pro/news/cuhksz-simultaneous-speech-translation-system-for-iwslt-2026.txt", "jsonld": "https://wpnews.pro/news/cuhksz-simultaneous-speech-translation-system-for-iwslt-2026.jsonld"}}