00:00
2026-05-27
djdumpling.github.io
machine-learning
hunting for headroom on modded-nanoGPT (WR #82)
A team of researchers from Yale's Building AI Infrastructure class achieved a new speed record of 81.2 seconds on the modded-nanoGPT benchmark by introducing a learnable per-(layer, head) tanh(Ξ±) gateβ¦