{"slug": "aura-st-acoustic-unconstrained-residual-architecture-for-speech-translation", "title": "AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation", "summary": "Researchers presented AURA-ST, a three-stage modular pipeline for low-resource speech-to-text translation at IWSLT 2026. The system bypasses traditional cross-attention by treating projected acoustic representations as token prefixes to a frozen large language model, achieving a best SacreBLEU of 91.29 on the validation set for Hausa, Igbo, and Yoruba translation.", "body_md": "[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28.pdf)\n\n[Barathi Ganesh HB](/people/barathi-ganesh-hb/),\n[Michal Ptaszynski](/people/michal-ptaszynski/),\n[Jairam R](/people/jairam-r/),\n[Reshma Unnikrishnan](/people/reshma-unnikrishnan/)\n\n##### Abstract\n\nWe present AURA-ST, a three-stage modular pipeline for low-resource speech-to-text translation submitted to the IWSLT 2026 African-Celtic Track 1. The architecture bypasses traditional cross-attention between audio and text modalities by treating projected acoustic representations as a native token prefix to a frozen large language model. A dual-stream encoder captures linguistic and paralinguistic features via a jointly trained semantic and a paralinguistic encoder. A convolutional subsampler then bridges the modality gap through a 4x temporal compression and a linear projection into the LLM embedding space. Finally, a MLP-targeted Low-Rank Adaptation adapter fine-tunes the frozen Gemma-4-E2B backbone for translation without catastrophic forgetting of base language model knowledge. We further identify and resolve the incompatibility between standard PEFT attention-level adapter injection and the Gemma-4 Per-Layer Embedding architecture that tends to cause gradient isolation. Trained on the IWSLT 2026 Track 1 data covering Hausa, Igbo, and Yoruba, the final system achieves a best proxy teacher-forced SacreBLEU of 91.29 on the validation set at Phase 3, with Phase 1 speech encoder validation loss converging to 0.651.- Anthology ID:\n- 2026.iwslt-1.28\n- Volume:\n[Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)](/volumes/2026.iwslt-1/)- Month:\n- July\n- Year:\n- 2026\n- Address:\n- San Diego, USA (in-person and online)\n- Editors:\n[Elizabeth Salesky](/people/elizabeth-salesky/),[Antonios Anastasopoulos](/people/antonios-anastasopoulos/),[Matteo Negri](/people/matteo-negri/),[Marcello Federico](/people/marcello-federico/)- Venues:\n[IWSLT](/venues/iwslt/)|[WS](/venues/ws/)- SIG:\n[SIGSLT](/sigs/sigslt/)- Publisher:\n- Association for Computational Linguistics\n- Note:\n- Pages:\n- 247–254\n- Language:\n- URL:\n[https://aclanthology.org/2026.iwslt-1.28/](https://aclanthology.org/2026.iwslt-1.28/)- DOI:\n- Cite (ACL):\n- Barathi Ganesh HB, Michal Ptaszynski, Jairam R, and Reshma Unnikrishnan. 2026.\n[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28/). In*Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)*, pages 247–254, San Diego, USA (in-person and online). Association for Computational Linguistics. - Cite (Informal):\n[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28/)(HB et al., IWSLT 2026)- PDF:\n[https://aclanthology.org/2026.iwslt-1.28.pdf](https://aclanthology.org/2026.iwslt-1.28.pdf)", "url": "https://wpnews.pro/news/aura-st-acoustic-unconstrained-residual-architecture-for-speech-translation", "canonical_source": "https://aclanthology.org/2026.iwslt-1.28/", "published_at": "2026-06-30 00:00:00+00:00", "updated_at": "2026-06-30 18:52:13.179412+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-research"], "entities": ["Barathi Ganesh HB", "Michal Ptaszynski", "Jairam R", "Reshma Unnikrishnan", "Gemma-4-E2B", "IWSLT 2026", "Association for Computational Linguistics"], "alternates": {"html": "https://wpnews.pro/news/aura-st-acoustic-unconstrained-residual-architecture-for-speech-translation", "markdown": "https://wpnews.pro/news/aura-st-acoustic-unconstrained-residual-architecture-for-speech-translation.md", "text": "https://wpnews.pro/news/aura-st-acoustic-unconstrained-residual-architecture-for-speech-translation.txt", "jsonld": "https://wpnews.pro/news/aura-st-acoustic-unconstrained-residual-architecture-for-speech-translation.jsonld"}}