Scaling Hypothesis #2: Are Humans Just More Over-Parameterized?

wpnews.pro

cd /news/artificial-intelligence/scaling-hypothesis-2-are-humans-just… · home › topics › artificial-intelligence › article

[ARTICLE · art-30447] src=lesswrong.com ↗ pub=2026-06-17T02:53Z topic=artificial-intelligence verified=true sentiment=· neutral

Scaling Hypothesis #2: Are Humans Just More Over-Parameterized?

A researcher proposes that human brains minimize bias through extreme overparameterization and high-learning-rate training on small diverse datasets, while LLMs minimize variance. This 'catapulting' hypothesis could explain differences in generalization and adversarial robustness, suggesting a new scaling paradigm for AI that may improve safety and efficiency.

read1 min views21 publishedJun 17, 2026

(2024-04-21) There are many mysteries about deep learning and human intelligence, but we could describe the biggest anomaly this way: why are artificial neural nets smart in such stupid ways, and biological brains stupid but in smart ways?

I propose a major change in deep learning scaling paradigms: the architectural differences between human brains and NNs (particularly LLMs) may be due to a bias-variance tradeoff, where LLMs minimize variance and human brains minimize bias. Human brains do this by deep double descent-style overparameterization, and adopting a scaling strategy of extremely high-learning-rate training of extremely overparameterized models on small diverse highly-filtered datasets. This approach would lead to sample-efficiently and compute-efficiently traveling (or catapulting) to a highly-generalizing human-like basin in the model loss landscape, while performing poorly up until the end and failing to memorize much data.

If true, this would explain a number of odd stylized facts about how humans/NNs perform well/poorly. Such a 'catapulted LLM' would generalize much better than existing NNs, be immune to adversarial attacks, have better economics and be more resistant to cloning, could potentially enable extremely efficient MLP architectures, and by giving true generalization, provide a sturdy foundation for AI safety in the form of useful NNs which are aligned & safe for the right reasons.

This could be feasibly tested by training multi-trillion-parameter models for relatively few steps at high cyclical learning rate schedules, and benchmarking adversarial and hard examples on tasks like arithmetic and small-image classification.

source & further reading

lesswrong.com — original article Confirming Claims of Superposition and Adversarial Examples in Toy Models Bayeswatch: a Retrospective Using AI to analyze life patterns