cd/entity/GPT-2· home› entities› GPT-2

grep -l @gpt-2 /news/*.json | wc -l → 45

GPT-2

mentions 45 type Organization page 2/3 feed RSS

// recent coverage 45 mentions

10:36

2026-06-17

danluu.com

artificial-intelligence

Against essential and accidental complexity (2020)

In a 2020 analysis of Fred Brooks' 1986 essay 'No Silver Bullet,' the author argues that Brooks' claim of a 2x limit on programmer productivity improvements is flawed. The author contends that Brooks …

04:51

2026-06-17

github.com

large-language-models

GPT-2 124M checkpoint pre-trained on OpenWebText 27.5B tokens

A 124M-parameter GPT-2 model trained from scratch on OpenWebText data using a custom deep learning library achieved a validation loss of 2.764 nats and a perplexity of 15.87 after 56,000 steps (27.5B …

04:00

2026-06-17

arxiv.org

ai-safety

Rift: A Conflict Signature for Deception in Language Models

Researchers at arXiv have identified a conflict signature in language models that distinguishes deceptive outputs from honest errors, achieving 100% accuracy in detecting lies across multiple models i…

04:00

2026-06-17

arxiv.org

large-language-models

Nothing from Something: Can a Language Model Discover 0?

Researchers tested whether language models can independently discover the concept of zero through arithmetic generalization. GPT-2-sized models failed at test time regardless of language pretraining, …

02:11

2026-06-17

gilesthomas.com

machine-learning

Flax debugging: making a hash of things

A developer debugging a JAX/Flax NNX training loop discovered that the loss was stuck at 10.82, indicating the model was performing no better than random guessing. The issue was traced to the training…

14:51

2026-06-15

discuss.huggingface.co

large-language-models

Cross-architectural runtime probability dynamics in transformer LLMs — two clusters not explained by parameter count

A new measurement framework reveals that eight open-source transformer LLMs partition into two distinct clusters based on runtime probability distribution dynamics, with an order-of-magnitude gap in G…

20:45

2026-06-14

marktechpost.com

large-language-models

A Coding Hands-On on FineWeb for Streaming, Filtering, Deduplication, Tokenization, and Large-Scale Web Corpus Analytics

A tutorial demonstrates streaming, filtering, deduplication, tokenization, and analytics on the FineWeb dataset using Python, reproducing quality-filtering pipelines and MinHash-based near-duplicate d…

04:00

2026-06-12

arxiv.org

artificial-intelligence

RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways

Researchers have introduced Rotary Value Embeddings (RoVE), a parameter-free modification to Rotary Position Embeddings (RoPE) that makes value pathways position-sensitive by rotating values simultane…

00:00

2026-06-12

mindstudio.ai

artificial-intelligence

Diffusion Language Models Explained: How Google's Diffusion Gemma Works

Google released Diffusion Gemma in early 2025 as its first open-weight diffusion language model, using a masked diffusion approach that generates text by starting with noise and iteratively refining i…

00:00

2026-06-09

andlukyane.com

machine-learning

Book Review: 50 ML Projects to Understand LLMs

Mike X Cohen's new book "50 ML Projects to Understand LLMs" uses GPT-2 as a scientific specimen, teaching readers to investigate the model through 50 hands-on projects focused on code, statistics, and…

04:00

2026-06-05

arxiv.org

large-language-models

Trajectory Dynamics in Language Model Hidden States Predict Human Processing Costs Beyond Surprisal

Researchers introduced trajectory extrapolation error, a measure of how much a language model's hidden states deviate from a linear path during word processing, and found it independently predicts hum…

04:00

2026-06-05

arxiv.org

artificial-intelligence

Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics

Researchers have developed a bilayer SIR/SIRS epidemiological model to analyze how AI models collapse when trained on synthetic data generated by other AI systems, treating data corpora and AI models …

05:42

2026-06-04

arxiv.org

large-language-models

Arithmetic Pedagogy for Language Models

Researchers trained a small GPT-2 model on arithmetic problems using an Indonesian pedagogy called GASING, which breaks down calculations into left-to-right steps aligned with token generation. The 86…

15:05

2026-06-03

pytorch.org

machine-learning

Using Muon Optimizer with DeepSpeed

DeepSpeed has integrated the Muon Optimizer, a memory-efficient optimizer that uses a single momentum buffer and Newton-Schulz orthogonalization to improve training convergence, particularly for 2D we…

14:15

2026-05-30

dev.to

artificial-intelligence

ai, deepseek, machinelearning

Chinese AI labs have progressed from early BERT-era models to trillion-parameter systems like Wu Dao 2.0 (1.75T parameters) and cost-efficient architectures such as DeepSeek V3 (trained for $5.6M), ac…

06:00

2026-05-30

lesswrong.com

large-language-models

Ablating Induction Heads Leads to an increase in Local Repetition

Ablating induction heads in the GPT-2 small language model increased local repetition in the model's output, a finding validated through activation patching and comparison with random head ablation. T…

22:49

2026-05-28

news.ycombinator.com

large-language-models

Ask HN: Is Claude Opus 4.8 broken?

A user reported that Claude Opus 4.8 is exhibiting severe performance degradation, including an inability to read files, hallucinating file paths, and repeatedly generating errors with incorrect comma…

16:11

2026-05-28

cryptobriefing.com

artificial-intelligence

Epoch AI projects model serving to surpass building by 2030

Epoch AI projects that compute power for running AI models will surpass compute power for building them by 2030, with nearly half of inference operations shifting to specialized ASICs while training c…

00:45

2026-05-27

heyhyper.ai

ai-products

Show HN: Hyper, the self driving company brain

Shalin and Kanyes, cofounders of Hyper, launched a self-serve version of their "second brain" AI software that integrates personal context to automate tasks. The founders, who previously built robots …

04:38

2026-05-23

dev.to

large-language-models

Diffusion Language Models: How NVIDIA Nemotron-Labs Diffusion Shatters the Autoregressive Speed Ceiling

NVIDIA released Nemotron-Labs Diffusion on May 23, 2026, a family of diffusion language models (DLMs) that generate entire blocks of tokens in parallel and iteratively refine them, rather than produci…

← prev page 2 / 3 next →

// co-occurs with top 8 entities

arXiv 6 GPT-4 5 BERT 4 Hacker News 3 Phi-1.5 3 OPT-125M 3 Hugging Face 2 LLaMA 2