cd/entity/GPT-2· home entities GPT-2
grep -l @gpt-2 /news/*.json | wc -l → 45

GPT-2

mentions 45 type Organization page 2/3 feed RSS

// recent coverage 45 mentions

10:36
2026-06-17
danluu.com
artificial-intelligence

Against essential and accidental complexity (2020)

In a 2020 analysis of Fred Brooks' 1986 essay 'No Silver Bullet,' the author argues that Brooks' claim of a 2x limit on programmer productivity improvements is flawed. The author contends that Brooks …

04:51
2026-06-17
github.com
large-language-models

GPT-2 124M checkpoint pre-trained on OpenWebText 27.5B tokens

A 124M-parameter GPT-2 model trained from scratch on OpenWebText data using a custom deep learning library achieved a validation loss of 2.764 nats and a perplexity of 15.87 after 56,000 steps (27.5B …

04:00
2026-06-17
arxiv.org
ai-safety

Rift: A Conflict Signature for Deception in Language Models

Researchers at arXiv have identified a conflict signature in language models that distinguishes deceptive outputs from honest errors, achieving 100% accuracy in detecting lies across multiple models i…

04:00
2026-06-17
arxiv.org
large-language-models

Nothing from Something: Can a Language Model Discover 0?

Researchers tested whether language models can independently discover the concept of zero through arithmetic generalization. GPT-2-sized models failed at test time regardless of language pretraining, …

02:11
2026-06-17
gilesthomas.com
machine-learning

Flax debugging: making a hash of things

A developer debugging a JAX/Flax NNX training loop discovered that the loss was stuck at 10.82, indicating the model was performing no better than random guessing. The issue was traced to the training…

00:00
2026-06-09
andlukyane.com
machine-learning

Book Review: 50 ML Projects to Understand LLMs

Mike X Cohen's new book "50 ML Projects to Understand LLMs" uses GPT-2 as a scientific specimen, teaching readers to investigate the model through 50 hands-on projects focused on code, statistics, and…

05:42
2026-06-04
arxiv.org
large-language-models

Arithmetic Pedagogy for Language Models

Researchers trained a small GPT-2 model on arithmetic problems using an Indonesian pedagogy called GASING, which breaks down calculations into left-to-right steps aligned with token generation. The 86…

15:05
2026-06-03
pytorch.org
machine-learning

Using Muon Optimizer with DeepSpeed

DeepSpeed has integrated the Muon Optimizer, a memory-efficient optimizer that uses a single momentum buffer and Newton-Schulz orthogonalization to improve training convergence, particularly for 2D we…

14:15
2026-05-30
dev.to
artificial-intelligence

ai, deepseek, machinelearning

Chinese AI labs have progressed from early BERT-era models to trillion-parameter systems like Wu Dao 2.0 (1.75T parameters) and cost-efficient architectures such as DeepSeek V3 (trained for $5.6M), ac…

22:49
2026-05-28
news.ycombinator.com
large-language-models

Ask HN: Is Claude Opus 4.8 broken?

A user reported that Claude Opus 4.8 is exhibiting severe performance degradation, including an inability to read files, hallucinating file paths, and repeatedly generating errors with incorrect comma…

16:11
2026-05-28
cryptobriefing.com
artificial-intelligence

Epoch AI projects model serving to surpass building by 2030

Epoch AI projects that compute power for running AI models will surpass compute power for building them by 2030, with nearly half of inference operations shifting to specialized ASICs while training c…

00:45
2026-05-27
heyhyper.ai
ai-products

Show HN: Hyper, the self driving company brain

Shalin and Kanyes, cofounders of Hyper, launched a self-serve version of their "second brain" AI software that integrates personal context to automate tasks. The founders, who previously built robots …

← prev page 2 / 3 next →
// co-occurs with top 8 entities