Transformers Learn the Mestre-Nagao Heuristic

wpnews.pro

cd /news/machine-learning/transformers-learn-the-mestre-nagao-… · home › topics › machine-learning › article

[ARTICLE · art-28979] src=arxiv.org ↗ pub=2026-06-16T04:00Z topic=machine-learning verified=true sentiment=↑ positive

Transformers Learn the Mestre-Nagao Heuristic

Researchers trained a two-layer transformer encoder to classify rational elliptic curves by rank, achieving over 99% accuracy. The model learned the Mestre-Nagao heuristic from data alone, with input weights matching the heuristic with a Spearman coefficient of 0.997. Mechanistic interpretability revealed a sparse circuit of 20 neurons implementing a push-pull detector architecture.

read1 min views1 publishedJun 16, 2026

arXiv:2606.15036v1 Announce Type: new
Abstract: We train a two-layer transformer encoder to classify rational elliptic curves $E/\mathbb{Q}$ of conductor $\leq 10000$ as either rank 0 or rank 1 from the first 128 normalized Frobenius traces. We achieve >99% accuracy on both classes, and accuracy is essentially unchanged on test curves with no isogeny or quadratic-twist relative in the training set. We then apply techniques from mechanistic interpretability such as attention analysis, linear probing, activation patching, logit attribution, and neuron-level circuit analysis to reverse-engineer the algorithm the (centroid in function space) model learned. We find that a sparse circuit of 20 out of 512 layer-1 MLP neurons is sufficient for rank prediction under a linear probe with an AUROC of 0.992 at plateau, implementing a push-pull detector architecture of rank-0 and rank-1 detectors with a one-sided readout. However, we notice that the model has sub-optimal readout problems indicating a mismatch in rank-order between the readout pathway and the discriminative circuit. Critically, the learned input weights of the top discriminating neuron match the Mestre-Nagao sum heuristic weights $\log(p)/(p\cdot \log{B})$ with a Spearman coefficient $r = 0.997$ and Pearson coefficient $r = 0.952$: the model has learnt a result from analytic number theory from the Frobenius trace data alone. We additionally find that all 50 independently trained models concentrate CLS attention on prime positions at 2-50$\times$ the rate of composite positions. The CLS embedding encodes $\log{L(E,1)}$ with $R^2 = 0.962\pm 0.011$ across the 50 models (after controlling for the conductor). Activation patching analysis reveals that attention weights are dissociated from causal information flow. Additionally, the 50 solutions from training are near-identical in function space (with pairwise agreement $>$98.8%) despite large weight space barriers.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/transformers-learn-the-m…

Read original on arxiv.org → arxiv.org/abs/2606.15036

mentioned entities

arXiv

Mestre-Nagao

metadata

slugtransformers-learn-the-mestre-nagao-heuristic

topic#machine-learning

secondary2 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevBuild Your Own AI Automation wit…

next →Could a diamond wafer as wide as…

── more in #machine-learning 4 stories · sorted by recency

letsdatascience.com · 16 Jun · #machine-learning

Semi-Supervised Verifier Scales LLM Reasoning from Minimal Labels

letsdatascience.com · 16 Jun · #machine-learning

Tangram hides GPU heterogeneity for LLM parallelization

letsdatascience.com · 16 Jun · #machine-learning

LOGOS introduces a generative foundation model for science

letsdatascience.com · 16 Jun · #machine-learning

RDS presents hybrid fusion for irony detection

── more on @arxiv 3 stories trending now

wpnews · 15 Jun · #artificial-intelligence

Facebook now has an AI search engine that pulls answers from your Group posts and Reels

wpnews · 15 Jun · #generative-ai

Pentagon Reports 1.5 Million Daily GenAI.mil Users

wpnews · 15 Jun · #large-language-models

The Grain of Thought

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required