From Lightning to Sparse: How MiniMax M3 Reads a Million Tokens Without Reading Them All
MiniMax introduces M3, a sparse attention mechanism that efficiently processes up to a million tokens by selectively reading only relevant parts of the input, overcoming production failures of prior e…