Soft Token Alignment for Cross-Lingual Reasoning

wpnews.pro

cd /news/large-language-models/soft-token-alignment-for-cross-lingu… · home › topics › large-language-models › article

[ARTICLE · art-40245] src=arxiv.org ↗ pub=2026-06-26T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Soft Token Alignment for Cross-Lingual Reasoning

Researchers propose SOLAR, an auxiliary objective for supervised fine-tuning that aligns soft-token representations across languages using English as a pivot, improving cross-lingual reasoning in multilingual large language models. Across four benchmarks, SOLAR boosts accuracy by up to +17.7 points over the base model and +3.8 over standard fine-tuning, with largest gains on low-resource languages.

read1 min views1 publishedJun 26, 2026

arXiv:2606.26466v1 Announce Type: new Abstract: Multilingual large language models often produce inconsistent reasoning and answers for semantically equivalent prompts in different languages. Prior work suggests that intermediate representations can be relatively language-agnostic, but generation becomes increasingly language-specific as models commit to discrete output tokens. This is problematic because language-specific lexical choices can cause semantically equivalent reasoning paths to diverge across languages. These divergences motivate searching for a cross-lingual alignment signal that is less tied to any single vocabulary item or script. We propose SOLAR, an auxiliary objective for supervised fine-tuning that aligns soft-token representations across languages, using English as a pivot. Soft tokens are probability-weighted mixtures over the vocabulary embeddings, yielding continuous representations that can aggregate information from semantically related tokens across languages. We then align each non-English soft-token summary to its English counterpart in the shared embedding space. Across four multilingual reasoning benchmarks, SOLAR improves accuracy by up to +17.7 points over the base model and +3.8 over standard supervised fine-tuning, with the largest gains on low-resource languages. SOLAR also strengthens final-layer cross-lingual similarity and substantially reduces language-cluster separability, suggesting that aligning soft-token representations helps preserve shared semantic structure during multilingual reasoning.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/soft-token-alignment-for…

Read original on arxiv.org → arxiv.org/abs/2606.26466

mentioned entities

SOLAR

arXiv

metadata

slugsoft-token-alignment-for-cross-lingual-reasoning

topic#large-language-models

secondary2 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevCan We Talk About the "AI/ML Eng…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 26 Jun · #large-language-models

Where Larger Models Excel: The Primacy of Constraint-Guided Reasoning

arxiv.org · 26 Jun · #large-language-models

Reducing Conversational Escalation in Large Language Model Dialogue with Nonviolent Communication Constraints

arxiv.org · 26 Jun · #large-language-models

Context Recycling for Long-Horizon LLM Inference

arxiv.org · 26 Jun · #large-language-models

Investigating LLM's Problem Solving Capability -- a Study on Statics Questions

── more on @solar 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required