S4 vs Mamba vs RWKV — what's the difference? | Rudrite Research

Rudrite Research published a comparison of three post-Transformer sequence models — S4, Mamba, and RWKV — highlighting their approaches to achieving linear computational cost while maintaining quality. The analysis explains how each model differs: S4 uses a structured state space, Mamba introduces a selective mechanism, and RWKV employs linear-attention RNN. The comparison aims to clarify the trade-offs for researchers and practitioners.

S4 vs Mamba vs RWKV The post-Transformer sequence lineage — a structured state space, a selective one, and a linear-attention RNN, all chasing linear cost without losing quality. A clear, side-by-side comparison with examples — part of Rudrite Research.