MiniMax M3: Open-Weight Frontier Model at 5% of Opus Cost

MiniMax released the M3 open-weight model, claiming it costs 5% of Claude Opus per task, achieves 59% on SWE-Bench Pro, and supports a 1-million-token context window at one-twentieth the compute of its predecessor. The model uses a new sparse attention architecture for efficient long-context processing and is available via API, OpenRouter, or self-hosted open weights. All benchmark results are vendor-published and lack independent verification.

MiniMax just released an open-weight model that costs roughly 5% of Claude Opus per task, hits 59% on SWE-Bench Pro, and runs a 1-million-token context window at one-twentieth the compute of its predecessor. Those numbers deserve scrutiny — they are all vendor-published — but MiniMax M3 https://www.minimax.io/blog/minimax-m3 is now the most cost-competitive option for developers building agentic systems, and it is worth understanding how it works before deciding whether to trust it. What Makes 1M Context Actually Practical Long context claims are common. Most models slow to a crawl or produce degraded output as they approach their limits. MiniMax M3 addresses this with a new attention architecture called MSA — MiniMax Sparse Attention — which selects relevant key-value blocks rather than computing full attention across the entire context window. The result: at 1 million tokens, per-token compute drops to one-twentieth of the previous M2 generation, delivering 9x faster prefill and 15x faster decode at max context. MiniMax demonstrated this with a 24-hour autonomous CUDA kernel optimization task improving hardware utilization from 7.6% to 71.3% and a 12-hour paper reproduction task, both completed without human intervention. Those are the kinds of long-horizon workflows where a large context window actually earns its keep. The Benchmarks: What the Numbers Say and Don’t MiniMax M3 scores 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, and 83.5 on BrowseComp — figures that reportedly put it above GPT-5.5 and Gemini 3.1 Pro on coding, and above Claude Opus 4.7 on autonomous browsing. The MSA technical report https://arxiv.org/abs/2606.13392 was published on arXiv on June 11. The necessary caveat: every one of those results was run by MiniMax, on MiniMax infrastructure, using MiniMax agent scaffolding. Independent third-party benchmarks were not available at launch on June 1. The comparison baseline is Claude Opus 4.7, not the more recently released 4.8 — so the gap to the current frontier is larger than the launch materials suggest. Benchmark your actual workflows before committing to production. The Cost Breakdown The economics are where M3 makes its strongest case. Standard API pricing is $0.60 per million input tokens and $2.40 per million output tokens. A task consuming 500,000 input tokens and 100,000 output tokens runs at roughly $0.27 with M3, versus $5.00 with Claude Opus — about 5% of the cost. The promotional rate cuts that further, though you should plan against standard rates for production budgeting. | Model | Input / 1M tokens | Output / 1M tokens | |---|---|---| | MiniMax M3 standard | $0.60 | $2.40 | | MiniMax M3 promo | $0.30 | $1.20 | | Claude Opus 4.7 | $5.00 | $25.00 | | GPT-5.5 | ~$10.00 | ~$30.00 | Three Ways to Use It MiniMax M3 is available through three paths depending on how much control you want: MiniMax API — First-party, OpenAI-compatible endpoint at api.minimax.io . Native multimodal support, thinking mode toggle, and the full 1M context window. This is the path with the data governance considerations discussed below. OpenRouter — Fastest route for testing. Set the model to minimax/minimax-m3 using existing OpenRouter credentials and you are running within minutes. Self-hosted open weights — The model is on Hugging Face MiniMaxAI/MiniMax-M3 https://huggingface.co/MiniMaxAI/MiniMax-M3 with SGLang, vLLM, and Transformers support. Quantized builds for llama.cpp, Ollama, and LM Studio are available. Fair warning: the full BF16 checkpoint is a 427B-parameter model, with 23B parameters active per inference. That is a real infrastructure commitment. curl https://openrouter.ai/api/v1/chat/completions \ -H "Authorization: Bearer $OPENROUTER API KEY" \ -H "Content-Type: application/json" \ -d '{"model": "minimax/minimax-m3", "messages": {"role": "user", "content": "Review this code for bugs and performance issues."} }' One Question Worth Answering Before You Deploy MiniMax is a Shanghai-based company subject to China’s 2017 National Intelligence Law, which requires cooperation with state intelligence operations. For the hosted API, this is a compliance question that deserves an explicit answer depending on what you are sending through the model. Self-hosting the open weights https://huggingface.co/MiniMaxAI/MiniMax-M3 eliminates this concern entirely. If your workloads involve regulated data or sensitive IP, the self-hosted path is the right call regardless of the cost story. Bottom Line MiniMax M3 is a technically serious model with a cost structure that changes the economics of agentic workflows. It is not a drop-in replacement for Claude Opus on sensitive workloads, and its benchmark numbers need independent verification. But for long-horizon coding agents, large-document pipelines, and high-volume inference where the numbers check out on your tasks, it has earned a real evaluation — not just a footnote. Run it on one real workflow first. The official launch post https://www.minimax.io/blog/minimax-m3 covers the architecture in depth, and the MSA technical report https://arxiv.org/abs/2606.13392 has the attention mechanism specifics for those who want to go deeper. For a warts-and-all practical evaluation, the agentic workflow evaluation on Medium https://medium.com/@cognidownunder/i-evaluated-minimax-m3-for-agentic-workflows-the-results-are-complicated-518b60d5e6a9 is worth reading before you commit.