Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

wpnews.pro

cd /news/machine-learning/tensor-memory-fixed-size-recurrent-s… · home › topics › machine-learning › article

[ARTICLE · art-16019] src=arxiv.org ↗ pub=2026-05-28T04:00Z topic=machine-learning verified=true sentiment=· neutral

Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

Researchers have introduced Tensor Memory, a lightweight module that gives Transformer models a fixed-size recurrent 3D memory tensor to handle long video sequences. The module writes tokens into a voxel grid using a differentiable soft write, updates memory with local interactions and gated recurrent dynamics, and reads context via continuous sampling. By decoupling state capacity from input length while preserving spatial inductive bias, Tensor Memory improves long-horizon video understanding and occlusion-sensitive reasoning without requiring architectural changes.

read1 min views3 publishedMay 28, 2026

arXiv:2605.27686v1 Announce Type: new Abstract: Transformers process images and videos by flattening space and time into long token sequences. While attention and KV caching preserve past features, their memory grows with sequence length and they lack an explicit, persistent spatial state, making long-horizon video understanding and occlusion-sensitive reasoning difficult. We propose Tensor Memory, a lightweight module that augments Transformer blocks with a fixed-size recurrent 3D memory tensor: tokens write into a voxel grid via a differentiable soft write that deposits content as a Gaussian-weighted volume around a predicted continuous 3D location, the memory is updated with an efficient local interaction operator and gated recurrent dynamics, and tokens read back context via continuous sampling with gated residual fusion. Because the memory tensor has a constant size, Tensor Memory decouples state capacity from input length while preserving a spatial inductive bias. We evaluate the module on standard language, image, and video benchmarks and on a controlled toy diagnostic suite designed to isolate when persistent state is beneficial; it integrates with standard Transformer training pipelines and can be attached to or removed from existing blocks without other architectural changes.

source & further reading

arxiv.org — original article

── more in #machine-learning 4 stories · sorted by recency

machinebrief.com · 14 Jul · #machine-learning

GNNs: Decoupling Feature Transformation from Topology

machinebrief.com · 14 Jul · #machine-learning

AI's Black Box: The Power of Weight-Adjusted Gradients

machinebrief.com · 14 Jul · #machine-learning

Human Pose Modeling with Neural Priors

machinebrief.com · 14 Jul · #machine-learning

MR Elastography with Deep Learning

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required