cd /news/natural-language-processing/efficient-punctuation-restoration-vi… · home topics natural-language-processing article
[ARTICLE · art-22174] src=arxiv.org pub= topic=natural-language-processing verified=true sentiment=· neutral

Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems

Researchers have developed a non-autoregressive punctuation restoration method for streaming automatic speech recognition (ASR) systems that uses a weighted lookahead scoring approach with bounded future context. The method, which compares punctuation insertion hypotheses against a no-insertion baseline under a K-subword-token lookahead, achieved a macro F1 score of 0.893 without fine-tuning and 0.937 after fine-tuning on the IWSLT 2017 dataset, outperforming existing baselines. This approach addresses latency and alignment failures in streaming ASR by making incremental punctuation decisions at each word boundary without requiring free-form generation.

read1 min publishedJun 5, 2026

arXiv:2606.05179v1 Announce Type: new Abstract: Punctuation restoration improves ASR (Automatic Speech Recognition) readability. However streaming ASR requires online decisions with limited future context. In streaming ASR, the system predicts punctuation incrementally, which makes generation-based approaches prone to latency and alignment failures under boundary-wise evaluation. This paper proposes a non-autoregressive scoring method (no free-form generation) that preserves the input transcript and makes a decision at each word boundary. Our method compares punctuation insertion hypotheses against a no-insertion baseline under a bounded K-subword-token lookahead, and calibrates decisions using a weight {\alpha} and a validation-calibrated threshold {\tau} (no parameter updates during inference). On IWSLT 2017, our scoring method achieves a 4-class macro F1 of 0.893 in the no fine-tuning setting (validation-calibrated, K=2) and 0.937 after fine-tuning (K=2), outperforming the prompt-based baseline (0.566) and a fine-tuned ELECTRA baseline (0.913) under the same lookahead budget. We analyze the impact of the lookahead budget through ablation studies on K.

── more in #natural-language-processing 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/efficient-punctuatio…] indexed:0 read:1min 2026-06-05 ·