MiniMax Sparse Attention: Per-Group Block Selection for Cheap Million-Token Inference

MiniMax introduced a sparse attention mechanism that selects per-group blocks to enable cost-effective inference on million-token sequences. The technique reduces computational overhead while maintaining model quality, potentially lowering the barrier for long-context AI applications.

Sorry, the page you're looking for doesn't exist The page might have been moved, deleted, or never existed. Here are some helpful links: Home Page Blog Posts Projects Browse by Tags Contact Me