{"slug": "the-cognitive-categorical-transformer-category-theoretic-inductive-biases-for", "title": "The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling", "summary": "A new 306M-parameter language model architecture, the Cognitive Categorical Transformer (CCT), achieved 21.27 validation perplexity on WikiText-103, a 12% relative improvement over a fine-tuned GPT-2 Small baseline. The improvement is largely attributable to simplicial message passing, which accounted for 84% of the gain in an ablation study. The findings establish a structure/consistency distinction, where categorical priors that add topology improve performance while those enforcing consistency identities do not.", "body_md": "arXiv:2605.28864v1 Announce Type: new\nAbstract: The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and several inspirations from cognitive science. Under a matched-step protocol (215,000 optimizer steps, matched data, matched optimizer and schedule) on WikiText-103, CCT reaches 21.27 validation perplexity, compared with 24.19 for an identically fine-tuned GPT-2 Small baseline. The architecture therefore contributes a 2.92 PPL (12% relative) reduction beyond what in-domain fine-tuning alone provides. A retrain-from-scratch ablation that holds GT-Full simplicial message passing bypassed across the entire seven-phase activation schedule reaches 23.72 PPL, localizing 84% of the architectural improvement (2.45 of 2.92 PPL) to GT-Full. We present the first ablation-validated evidence that simplicial message passing improves language-model perplexity at the 306M-parameter scale on WikiText-103. Published GPT-2 Large reaches 22.05 zero-shot PPL on WikiText-103 with 6.2x more parameters than GPT-2 Small; this paper treats that number as an external published reference, not as the architectural benchmark. Three negative results on consistency-style categorical priors (sheaf smoothing, adjunction round-trip, curvature regularization) and the joint structural-prior result for GT-Full and PrecisionWeightedPP together support an empirical pattern termed the *structure/consistency distinction*, in which categorical priors that add new topology improve language modeling and those that enforce a consistency identity do not.", "url": "https://wpnews.pro/news/the-cognitive-categorical-transformer-category-theoretic-inductive-biases-for", "canonical_source": "https://arxiv.org/abs/2605.28864", "published_at": "2026-05-29 04:00:00+00:00", "updated_at": "2026-05-29 04:20:17.586238+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "artificial-intelligence", "machine-learning", "neural-networks"], "entities": ["Cognitive Categorical Transformer", "GPT-2 Small", "WikiText-103", "GPT-2 Large"], "alternates": {"html": "https://wpnews.pro/news/the-cognitive-categorical-transformer-category-theoretic-inductive-biases-for", "markdown": "https://wpnews.pro/news/the-cognitive-categorical-transformer-category-theoretic-inductive-biases-for.md", "text": "https://wpnews.pro/news/the-cognitive-categorical-transformer-category-theoretic-inductive-biases-for.txt", "jsonld": "https://wpnews.pro/news/the-cognitive-categorical-transformer-category-theoretic-inductive-biases-for.jsonld"}}