[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28.pdf)
[Barathi Ganesh HB](/people/barathi-ganesh-hb/),
[Michal Ptaszynski](/people/michal-ptaszynski/),
[Jairam R](/people/jairam-r/),
[Reshma Unnikrishnan](/people/reshma-unnikrishnan/)
Abstract
We present AURA-ST, a three-stage modular pipeline for low-resource speech-to-text translation submitted to the IWSLT 2026 African-Celtic Track 1. The architecture bypasses traditional cross-attention between audio and text modalities by treating projected acoustic representations as a native token prefix to a frozen large language model. A dual-stream encoder captures linguistic and paralinguistic features via a jointly trained semantic and a paralinguistic encoder. A convolutional subsampler then bridges the modality gap through a 4x temporal compression and a linear projection into the LLM embedding space. Finally, a MLP-targeted Low-Rank Adaptation adapter fine-tunes the frozen Gemma-4-E2B backbone for translation without catastrophic forgetting of base language model knowledge. We further identify and resolve the incompatibility between standard PEFT attention-level adapter injection and the Gemma-4 Per-Layer Embedding architecture that tends to cause gradient isolation. Trained on the IWSLT 2026 Track 1 data covering Hausa, Igbo, and Yoruba, the final system achieves a best proxy teacher-forced SacreBLEU of 91.29 on the validation set at Phase 3, with Phase 1 speech encoder validation loss converging to 0.651.- Anthology ID:
- 2026.iwslt-1.28
- Volume:
[Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)](/volumes/2026.iwslt-1/)- Month:
- July
- Year:
- 2026
- Address:
- San Diego, USA (in-person and online)
- Editors:
Elizabeth Salesky,Antonios Anastasopoulos,Matteo Negri,Marcello Federico- Venues:
[IWSLT](/venues/iwslt/)|[WS](/venues/ws/)- SIG:
[SIGSLT](/sigs/sigslt/)- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 247–254
- Language:
- URL:
[https://aclanthology.org/2026.iwslt-1.28/](https://aclanthology.org/2026.iwslt-1.28/)- DOI:
- Cite (ACL):
- Barathi Ganesh HB, Michal Ptaszynski, Jairam R, and Reshma Unnikrishnan. 2026. AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation. InProceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 247–254, San Diego, USA (in-person and online). Association for Computational Linguistics. - Cite (Informal):
[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28/)(HB et al., IWSLT 2026)- PDF:
[https://aclanthology.org/2026.iwslt-1.28.pdf](https://aclanthology.org/2026.iwslt-1.28.pdf)