{"slug": "minimax-m3-shows-what-happens-when-ai-stops-thinking-in-turns", "title": "MiniMax M3 Shows What Happens When AI Stops Thinking in Turns", "summary": "MiniMax released M3, an open-weight AI model with a 1 million token context window and native multimodal support, that outperformed frontier models in a 24-hour CUDA kernel optimization task by persisting through 145 submissions where others stopped after 30. The model, built on a new sparse attention architecture called MSA, achieved top scores on benchmarks including SVG-Bench and KernelBench Hard while competing with closed models from Anthropic, OpenAI, and Google. MiniMax made the API and its agent product available immediately, with model weights promised within 10 days.", "body_md": "Most models quit around submission 30 because they stop finding improvement and exit on their own. That’s what happened when MiniMax ran a CUDA kernel optimization task against a field of frontier models. Every model except two called it done within the first 30 submissions.\n\nM3’s best result came on submission 145. After 24 hours. After multiple plateaus where the numbers stopped moving and a reasonable model would have concluded there was nothing left to find.\n\nThat’s the thing MiniMax [released ](https://www.minimax.io/blog/minimax-m3)yesterday. An AI model with a 1M token context window, native multimodality, and apparently a problem with knowing when to stop.\n\n**Table of Contents**\n\n**What M3 is**\n\nM3 is an open-weight model with a 1 million token context window, native multimodal support for images and video, and what MiniMax describes as frontier-level coding and agentic performance. The weights aren’t out yet. MiniMax says that’s coming in about 10 days but the API is live and MiniMax Code, their agent product built specifically around M3, is available now.\n\nClosed models with these capabilities exist. What hasn’t existed until now is one you can actually run, inspect, and build on. MiniMax is explicit that M3 is the first open-weight model combining all three of these capabilities together: long context at this scale, native multimodality from step one of training, and agentic performance that competes with the frontier closed models.\n\n**The architecture behind the context window**\n\nA 1M token context window is only useful if the model can actually reason across it without the whole thing becoming unwieldy. Most long-context models struggle here, the attention mechanism that makes transformers work gets quadratically more expensive as context grows, and at 1M tokens that cost becomes either prohibitive or a hidden quality tradeoff.\n\nMiniMax built a new attention architecture for M3 called MSA, Minimax Sparse Attention. The short version: instead of every token attending to every other token, MSA partitions the context into blocks and routes attention more precisely. At 1M tokens, per-token compute drops to 1/20th of what their previous model needed. Prefilling runs more than 9x faster, decoding more than 15x faster.\n\nThe reason this matters for the agentic story specifically is that long-horizon tasks generate dense, structured context fast. Every tool call, every result, every iteration adds to the pile. A model that degrades as that pile grows isn’t actually useful for 24-hour tasks regardless of what the benchmarks say. MSA is MiniMax’s answer to that specific problem, and the CUDA kernel run is arguably the best stress test they could have picked to demonstrate it.\n\n**The benchmarks **\n\nBenchmark | Nano Banana M3 | Claude Opus 4.7 | GPT-5.5 | Gemini 3.1 Pro | Top Performer |\n| SWE Bench Pro | 59.0 | 64.3 | 58.6 | 54.2 | Claude Opus 4.7 |\n| Terminal Bench 2.1 | 66.0 | 66.1 | 78.2 | 70.0 | GPT-5.5 |\n| VIBE V2 | 50.1 | 55.8 | 50.5 | 28.0 | Claude Opus 4.7 |\n| SVG-Bench | 63.7 | 62.3 | 58.2 | 59.2 | MiniMax M3 |\n| KernelBench Hard | 28.8 | 30.7 | 20.9 | 18.6 | Claude Opus 4.7 |\n| BrowseComp | 83.5 | 79.3 | 84.4 | 85.9 | Gemini 3.1 Pro |\n| GDPval rubrics | 74.7 | 79.8 | 80.6 | 57.8 | GPT-5.5 |\n| BankerToolBench | 76.1 | 81.3 | 70.0 | 67.0 | Claude Opus 4.7 |\n| MCP Atlas | 74.2 | 77.0 | 75.3 | 69.2 | Claude Opus 4.7 |\n| OSWorld-verified | 70.0 | 82.8 | 78.7 | 76.2 | Claude Opus 4.7 |\n\nAll numbers below are self-reported by MiniMax.\n\nOn Claw-Eval, which tests end-to-end autonomous agent performance, M3 scores 74.5 against Claude Opus 4.7’s 71.6 and Gemini 3.1 Pro’s 57.8. On SVG-Bench it leads the entire comparison at 63.7, ahead of Opus 4.7 at 62.3 and GPT-5.5 at 58.2. KernelBench Hard, which tests the kind of low-level optimization work the CUDA task exemplifies, has M3 at 28.8 against GPT-5.5’s 20.9 and Gemini’s 18.6, a meaningful gap. SpreadsheetBench puts it at 89.35, competitive with every closed model in the comparison.\n\nThe pattern across these isn’t “M3 beats everything.” It’s more specific than that. The benchmarks where M3 leads tend to be the ones that reward persistence, structured output, and long-context coherence. The ones where it lacks are SWE-fficiency, Apex-Agents, OSWorld, tend to favor precise single-step execution or GUI interaction. That’s a consistent profile, not a scattered one, and it matches what the CUDA story already suggested.\n\n**Limitations**\n\nOSWorld, which tests a model’s ability to operate a real desktop GUI, has M3 at 70.06 against Opus 4.7’s 82.8 and GPT-5.5’s 78.7. That’s not close. SWE-fficiency, which measures how efficiently a model solves software engineering tasks rather than just whether it solves them, has M3 at 34.8 against Opus 4.7’s 42.2. Apex-Agents loses too at 27.7 against GPT-5.5’s 41.7.\n\nM3 is strong when the task rewards persistence and long-context coherence. It’s weaker when the task demands accurate single-step execution, especially anything involving GUI interaction or strict instruction following across many steps. MiniMax doesn’t hide this, the model card flags the agentic gaps directly.\n\n**You May Like:** [Open source AI agentic models built for real autonomous work](https://firethering.com/best-open-source-ai-agent-models/)\n\n**How to try it**\n\nThe API is live now at MiniMax’s platform. Pricing splits at 512K tokens, standard rate below that, higher rate above for long-document and full-repository work. Thinking mode can be toggled per request.\n\nThe weights aren’t available yet. MiniMax says that’s coming within 10 days along with the technical report. For now MiniMax Code, their agent product built specifically around M3, is available as a desktop app and runs on token-based subscription plans starting at $20 a month.\n\n**Submission 145**\n\nEvery other model in that CUDA test stopped making progress and exited. M3 kept going and found its best result 115 submissions later. The paper reproduction task is the same story, 12 hours, 18 commits and no human in the loop.\n\nAfter getting the weights launched on Huggingface, we may get more quantized versions of this model from the community to run on consumer hardware as well.", "url": "https://wpnews.pro/news/minimax-m3-shows-what-happens-when-ai-stops-thinking-in-turns", "canonical_source": "https://firethering.com/minimax-m3-open-weight-model/", "published_at": "2026-06-02 19:25:39+00:00", "updated_at": "2026-06-02 20:40:22.971751+00:00", "lang": "en", "topics": ["large-language-models", "artificial-intelligence", "ai-products", "ai-agents", "ai-research"], "entities": ["MiniMax", "M3", "MiniMax Code"], "alternates": {"html": "https://wpnews.pro/news/minimax-m3-shows-what-happens-when-ai-stops-thinking-in-turns", "markdown": "https://wpnews.pro/news/minimax-m3-shows-what-happens-when-ai-stops-thinking-in-turns.md", "text": "https://wpnews.pro/news/minimax-m3-shows-what-happens-when-ai-stops-thinking-in-turns.txt", "jsonld": "https://wpnews.pro/news/minimax-m3-shows-what-happens-when-ai-stops-thinking-in-turns.jsonld"}}