04:00
2026-06-16
arxiv.org
computer-vision
Beyond Self-Attention: Sub-Quadratic Vision Transformers for Fast Image Captioning
Researchers proposed a sub-quadratic vision transformer for image captioning that replaces standard self-attention with a Gaussian Mixture Model-based clustering mechanism, reducing computational comp…