The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

wpnews.pro

cd /news/large-language-models/the-cognitive-categorical-transforme… · home › topics › large-language-models › article

[ARTICLE · art-17146] src=arxiv.org ↗ pub=2026-05-29T04:00Z topic=large-language-models verified=true sentiment=↑ positive

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

A new 306M-parameter language model architecture, the Cognitive Categorical Transformer (CCT), achieved 21.27 validation perplexity on WikiText-103, a 12% relative improvement over a fine-tuned GPT-2 Small baseline. The improvement is largely attributable to simplicial message passing, which accounted for 84% of the gain in an ablation study. The findings establish a structure/consistency distinction, where categorical priors that add topology improve performance while those enforcing consistency identities do not.

read1 min views9 publishedMay 29, 2026

arXiv:2605.28864v1 Announce Type: new Abstract: The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and several inspirations from cognitive science. Under a matched-step protocol (215,000 optimizer steps, matched data, matched optimizer and schedule) on WikiText-103, CCT reaches 21.27 validation perplexity, compared with 24.19 for an identically fine-tuned GPT-2 Small baseline. The architecture therefore contributes a 2.92 PPL (12% relative) reduction beyond what in-domain fine-tuning alone provides. A retrain-from-scratch ablation that holds GT-Full simplicial message passing bypassed across the entire seven-phase activation schedule reaches 23.72 PPL, localizing 84% of the architectural improvement (2.45 of 2.92 PPL) to GT-Full. We present the first ablation-validated evidence that simplicial message passing improves language-model perplexity at the 306M-parameter scale on WikiText-103. Published GPT-2 Large reaches 22.05 zero-shot PPL on WikiText-103 with 6.2x more parameters than GPT-2 Small; this paper treats that number as an external published reference, not as the architectural benchmark. Three negative results on consistency-style categorical priors (sheaf smoothing, adjunction round-trip, curvature regularization) and the joint structural-prior result for GT-Full and PrecisionWeightedPP together support an empirical pattern termed the structure/consistency distinction, in which categorical priors that add new topology improve language modeling and those that enforce a consistency identity do not.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/the-cognitive-categorica…

Read original on arxiv.org → arxiv.org/abs/2605.28864

mentioned entities

Cognitive Categorical Transformer

GPT-2 Small

WikiText-103

GPT-2 Large

metadata

slugthe-cognitive-categorical-transformer-category-theoretic-inductive-biases-for

topic#large-language-models

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevChatGPT glitch is leaking OpenAI…

next →New infosec products of the mont…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 14 Jul · #large-language-models

Global Merger-Arbitrage Forecasting with Language Models

arxiv.org · 14 Jul · #large-language-models

Gauge dependence and structured-output corruption in sign-branched repetition penalties: measurements across models, inference stacks, and alternative repetition controls

arxiv.org · 14 Jul · #large-language-models

MVMGNN;Multi-View Masked Graph Neural Network for Alzheimer's Disease Diagnosis using Structural MRI

arxiv.org · 14 Jul · #large-language-models

Towards Objective Dysgraphia Detection: A Multi-Branch Deep Learning Approach for Online Handwriting Analysis

── more on @cognitive categorical transformer 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

wpnews · 8 Jul · #large-language-models

Gemini 3.5 Pro Delayed to July 17: Architectural Rebuild Explained

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required