04:51
2026-06-17
github.com
large-language-models
GPT-2 124M checkpoint pre-trained on OpenWebText 27.5B tokens
A 124M-parameter GPT-2 model trained from scratch on OpenWebText data using a custom deep learning library achieved a validation loss of 2.764 nats and a perplexity of 15.87 after 56,000 steps (27.5B โฆ