{"slug": "itnet-a-learnable-integral-transform-that-subsumes-convolution-attention-and", "title": "ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence", "summary": "Researchers introduced the Integral Transform Network (ITNet), a unified architecture that subsumes convolution, attention, and recurrence as special cases of a learnable integral transform. ITNet matches or exceeds specialized baselines on ImageNet-1K, GLUE, ModelNet40, VQA v2, and NLVR2, demonstrating that a single learned interaction mechanism can recover the behavior of all three architectural families from data.", "body_md": "arXiv:2606.19538v1 Announce Type: new\nAbstract: Convolutional networks, recurrent networks, and transformers each encode different inductive biases -- locality, sequential memory, and content-dependent pairwise interaction -- and have remained mathematically distinct since their inception. We show that this fragmentation reflects not a fundamental diversity in how signals should be processed, but rather incomplete views of a single underlying mathematical object: a learnable integral transform. We introduce the Integral Transform Network (ITNet), a unified architecture built around a learnable kernel that depends jointly on positions and features. This kernel is implemented as a small neural network, specifically an MLP, that models pairwise interactions, enabling the model to adapt its behavior from data. We show that convolution, self-attention (including multi-head), and autoregressive recurrence (including LSTM, GRU, S4, and Mamba) arise as special cases under appropriate parameterizations, and that ITNet is a universal approximator of continuous operators. To make this practical, we develop tiled kernel fusion, importance-weighted Monte Carlo integration, and learned low-rank factorization, enabling efficient and scalable computation. A single ITNet architecture with a shared operator and lightweight modality-specific encoders matches or exceeds specialized baselines on ImageNet-1K , GLUE, ModelNet40, VQA\\,v2 and NLVR2. The results demonstrate that a single learned interaction mechanism can recover the behavior of all three architectural families from data.", "url": "https://wpnews.pro/news/itnet-a-learnable-integral-transform-that-subsumes-convolution-attention-and", "canonical_source": "https://arxiv.org/abs/2606.19538", "published_at": "2026-06-19 04:00:00+00:00", "updated_at": "2026-06-19 04:03:18.112606+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "artificial-intelligence", "ai-research"], "entities": ["ITNet", "ImageNet-1K", "GLUE", "ModelNet40", "VQA v2", "NLVR2", "LSTM", "GRU"], "alternates": {"html": "https://wpnews.pro/news/itnet-a-learnable-integral-transform-that-subsumes-convolution-attention-and", "markdown": "https://wpnews.pro/news/itnet-a-learnable-integral-transform-that-subsumes-convolution-attention-and.md", "text": "https://wpnews.pro/news/itnet-a-learnable-integral-transform-that-subsumes-convolution-attention-and.txt", "jsonld": "https://wpnews.pro/news/itnet-a-learnable-integral-transform-that-subsumes-convolution-attention-and.jsonld"}}