16:38
2026-06-29
lesswrong.com
large-language-models
Gradient-free Single-pass Model Beats nanoGPT on Shakespeare
A new character-level language model called EntropyBeam, using gradient-free count tables and a Dirichlet prior, achieved a validation loss of 1.596 nats on the Shakespeare character benchmark, outperβ¦