cd /news/large-language-models/gradient-free-single-pass-model-beat… · home topics large-language-models article
[ARTICLE · art-43750] src=lesswrong.com ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

Gradient-free Single-pass Model Beats nanoGPT on Shakespeare

A new character-level language model called EntropyBeam, using gradient-free count tables and a Dirichlet prior, achieved a validation loss of 1.596 nats on the Shakespeare character benchmark, outperforming nanoGPT's 2.065 nats while using zero trainable parameters and 1,500x fewer total FLOPs.

read3 min views1 publishedJun 29, 2026

Beam is a character-level language model that computes count tables mapping character contexts to next-character frequencies.

At prediction time, each order looks up the current context in its count table and produces a distribution over the vocabulary, smoothed over a symmetric Dirichlet prior

ₒⱼ

Each order receives a capacity score composed of two terms:

where H(pₒ) is the Shannon entropy of the smoothed distribution. This is 1 when all mass is on one token and 0 when the distribution is uniform.

where n is the total count for the current context. This saturates toward 1 as evidence accumulates and is 0 when the context has not been observed.

A third term, capacity, is computed from the product of concentration and reliability. The capacity scores are converted to weights via softmax at temperature τ = 0.10:

ₒₒⱼⱼ

The low temperature makes the routing nearly winner-take-all: the highest-capacity order almost always dominates. The final prediction is the weighted geometric mean of the per-order distributions:

ₒₒₒ

This was chosen deliberately to assign high probability to a token only when multiple weighted orders agree.

The model has four hyperparameters: the set of context orders, α, τ, and the reliability threshold (min_count = 1). These were selected by evaluating variants on the validation set.

Evaluation uses the nanoGPT shakespeare_char benchmark: character-level Shakespeare, about 1M training tokens, about 100K validation tokens, and a vocabulary size of 65.

EntropyBeam uses 0 trainable parameters, a single fit pass, and character-level input.

Training tokens Validation loss, nats Contexts stored Transitions stored
1,000 2.954 5,495 6,388
3,000 2.654 14,670 17,176
10,000 2.482 44,092 51,835
30,000 2.289 120,043 140,961
100,000 2.193 346,462 405,119
300,000 1.990 919,897 1,071,750
1,003,854 1.596 2,753,581 3,199,496

nanoGPT uses 60,192 parameters, 2 layers, n_embd=48

, n_head=4

, block_size=32

, batch_size=16

, and AdamW with lr=1e-3 , wd=0.01

.

Step Tokens seen Validation loss, nats
0 0 4.189
300 153,600 2.507
600 307,200 2.409
1,200 614,400 2.262
1,800 921,600 2.162
2,400 1,228,800 2.096
3,000 1,536,000 2.065
Metric EntropyBeam nanoGPT Ratio
Fit/train FLOPs 0.009 G 614 G 68,000x
FLOPs per prediction 4,500 133,000 30x
Total FLOPs to result ~0.5 G ~760 G ~1,500x
Validation loss, nats 1.596 2.065
Trainable parameters 0 60,192
Wall time 12s 26s

Per-decade improvement in validation loss.

Range | Change in loss, nats |
|---|---|

1K to 10K | -0.47 | 10K to 100K | -0.29 | 100K to 1M | -0.60 |

Storage is not comparable directly to a transformer's parameter count. EntropyBeam stores 2.7M context-transition entries, compared to 60k learned floats for the transformer. Either way, the fixed combination rule achieves lower cross-entropy than learned optimization on the corpus.

The model was not compared with many different transformer baselines, but in limited testing, it achieved similar next-token prediction accuracy in larger datasets.

The code is available under https://github.com/zw5/beam

── more in #large-language-models 4 stories · sorted by recency
── more on @entropybeam 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/gradient-free-single…] indexed:0 read:3min 2026-06-29 ·