cd /news/large-language-models/aura-st-acoustic-unconstrained-resid… · home topics large-language-models article
[ARTICLE · art-45379] src=aclanthology.org ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation

Researchers presented AURA-ST, a three-stage modular pipeline for low-resource speech-to-text translation at IWSLT 2026. The system bypasses traditional cross-attention by treating projected acoustic representations as token prefixes to a frozen large language model, achieving a best SacreBLEU of 91.29 on the validation set for Hausa, Igbo, and Yoruba translation.

read2 min views17 publishedJun 30, 2026
AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation
Image: Aclanthology (auto-discovered)
[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28.pdf)

[Barathi Ganesh HB](/people/barathi-ganesh-hb/),
[Michal Ptaszynski](/people/michal-ptaszynski/),
[Jairam R](/people/jairam-r/),
[Reshma Unnikrishnan](/people/reshma-unnikrishnan/)
Abstract

We present AURA-ST, a three-stage modular pipeline for low-resource speech-to-text translation submitted to the IWSLT 2026 African-Celtic Track 1. The architecture bypasses traditional cross-attention between audio and text modalities by treating projected acoustic representations as a native token prefix to a frozen large language model. A dual-stream encoder captures linguistic and paralinguistic features via a jointly trained semantic and a paralinguistic encoder. A convolutional subsampler then bridges the modality gap through a 4x temporal compression and a linear projection into the LLM embedding space. Finally, a MLP-targeted Low-Rank Adaptation adapter fine-tunes the frozen Gemma-4-E2B backbone for translation without catastrophic forgetting of base language model knowledge. We further identify and resolve the incompatibility between standard PEFT attention-level adapter injection and the Gemma-4 Per-Layer Embedding architecture that tends to cause gradient isolation. Trained on the IWSLT 2026 Track 1 data covering Hausa, Igbo, and Yoruba, the final system achieves a best proxy teacher-forced SacreBLEU of 91.29 on the validation set at Phase 3, with Phase 1 speech encoder validation loss converging to 0.651.- Anthology ID:

- 2026.iwslt-1.28
- Volume:
[Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)](/volumes/2026.iwslt-1/)- Month:
  • July
  • Year:
  • 2026
- Address:
- San Diego, USA (in-person and online)
- Editors:

Elizabeth Salesky,Antonios Anastasopoulos,Matteo Negri,Marcello Federico- Venues:

[IWSLT](/venues/iwslt/)|[WS](/venues/ws/)- SIG:
[SIGSLT](/sigs/sigslt/)- Publisher:
  • Association for Computational Linguistics
- Note:
- Pages:
  • 247–254
- Language:
- URL:
[https://aclanthology.org/2026.iwslt-1.28/](https://aclanthology.org/2026.iwslt-1.28/)- DOI:
- Cite (ACL):
[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28/)(HB et al., IWSLT 2026)- PDF:
[https://aclanthology.org/2026.iwslt-1.28.pdf](https://aclanthology.org/2026.iwslt-1.28.pdf)
── more in #large-language-models 4 stories · sorted by recency
── more on @barathi ganesh hb 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/aura-st-acoustic-unc…] indexed:0 read:2min 2026-06-30 ·