Lilian Weng

mentions 1 type Person feed RSS

// recent coverage 1 mentions

00:02

2026-06-29

ianbarber.blog

large-language-models

It’s always the learning rates

Scaling laws predict training loss as model size, dataset size, and compute scale, but their practical application is sensitive to hyperparameter choices like learning rate. Lilian Weng's post highlig…

// co-occurs with top 7 entities

Yang 1 Hu 1 Zhou 1 Xing 1 Shanghai AI Lab 1 BERT 1 GPT-3 1