AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation

wpnews.pro

cd /news/large-language-models/aura-st-acoustic-unconstrained-resid… · home › topics › large-language-models › article

[ARTICLE · art-45379] src=aclanthology.org ↗ pub=2026-06-30T00:00Z topic=large-language-models verified=true sentiment=↑ positive

AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation

Researchers presented AURA-ST, a three-stage modular pipeline for low-resource speech-to-text translation at IWSLT 2026. The system bypasses traditional cross-attention by treating projected acoustic representations as token prefixes to a frozen large language model, achieving a best SacreBLEU of 91.29 on the validation set for Hausa, Igbo, and Yoruba translation.

read2 min views17 publishedJun 30, 2026

AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation — Image: Aclanthology (auto-discovered)

[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28.pdf)

[Barathi Ganesh HB](/people/barathi-ganesh-hb/),
[Michal Ptaszynski](/people/michal-ptaszynski/),
[Jairam R](/people/jairam-r/),
[Reshma Unnikrishnan](/people/reshma-unnikrishnan/)

Abstract

We present AURA-ST, a three-stage modular pipeline for low-resource speech-to-text translation submitted to the IWSLT 2026 African-Celtic Track 1. The architecture bypasses traditional cross-attention between audio and text modalities by treating projected acoustic representations as a native token prefix to a frozen large language model. A dual-stream encoder captures linguistic and paralinguistic features via a jointly trained semantic and a paralinguistic encoder. A convolutional subsampler then bridges the modality gap through a 4x temporal compression and a linear projection into the LLM embedding space. Finally, a MLP-targeted Low-Rank Adaptation adapter fine-tunes the frozen Gemma-4-E2B backbone for translation without catastrophic forgetting of base language model knowledge. We further identify and resolve the incompatibility between standard PEFT attention-level adapter injection and the Gemma-4 Per-Layer Embedding architecture that tends to cause gradient isolation. Trained on the IWSLT 2026 Track 1 data covering Hausa, Igbo, and Yoruba, the final system achieves a best proxy teacher-forced SacreBLEU of 91.29 on the validation set at Phase 3, with Phase 1 speech encoder validation loss converging to 0.651.- Anthology ID:

- 2026.iwslt-1.28
- Volume:
[Proceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026)](/volumes/2026.iwslt-1/)- Month:

July
Year:
2026

- Address:
- San Diego, USA (in-person and online)
- Editors:

Elizabeth Salesky,Antonios Anastasopoulos,Matteo Negri,Marcello Federico- Venues:

[IWSLT](/venues/iwslt/)|[WS](/venues/ws/)- SIG:
[SIGSLT](/sigs/sigslt/)- Publisher:

Association for Computational Linguistics

- Note:
- Pages:

247–254

- Language:
- URL:
[https://aclanthology.org/2026.iwslt-1.28/](https://aclanthology.org/2026.iwslt-1.28/)- DOI:
- Cite (ACL):

Barathi Ganesh HB, Michal Ptaszynski, Jairam R, and Reshma Unnikrishnan. 2026. AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation. InProceedings of the 23rd International Conference on Spoken Language Translation (IWSLT 2026), pages 247–254, San Diego, USA (in-person and online). Association for Computational Linguistics. - Cite (Informal):

[AURA-ST: Acoustic-Unconstrained Residual Architecture for Speech Translation](https://aclanthology.org/2026.iwslt-1.28/)(HB et al., IWSLT 2026)- PDF:
[https://aclanthology.org/2026.iwslt-1.28.pdf](https://aclanthology.org/2026.iwslt-1.28.pdf)

source & further reading

aclanthology.org — original article AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task Balancing Linguistic Intelligibility and Speaker Identity in Zero-Shot Cross-Lingual Voice Cloning BSC’s Submission to the Instruction Following Track of IWSLT 2026

~/api · this article 200

$curl api.wpnews.pro/v1/news/aura-st-acoustic-unconst…

Read original on aclanthology.org → aclanthology.org/2026.iwslt-1.28/

mentioned entities

Barathi Ganesh HB

Michal Ptaszynski

Jairam R

Reshma Unnikrishnan

Gemma-4-E2B

IWSLT 2026

Association for Computational Linguistics

metadata

slugaura-st-acoustic-unconstrained-residual-architecture-for-speech-translation

topic#large-language-models

secondary2 topics

sentimentpositive

canonicalaclanthology.org

navigation

← prevEvaluating Intelligence

next →How South Korea’s AI megaproject…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 21 May · #large-language-models

Building 'Offline Brain': How I Wrote My First Custom Agent Skill for Android (Google I/O 2026) 📱🧠

dev.to · 4 Jul · #large-language-models

Transformers — The Architecture That Changed AI (Part 1 of 3)

discuss.huggingface.co · 4 Jul · #large-language-models

Any need for a tester and challenger for AI models?

github.com · 4 Jul · #large-language-models

OpenScience: Workbench for scientific research using custom LLMs

── more on @barathi ganesh hb 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 4 Jul · #large-language-models

Transformers — The Architecture That Changed AI (Part 1 of 3)

wpnews · 4 Jul · #artificial-intelligence

Istota, a personal AI operating system

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required