{"slug": "minimax-m3-open-weight-frontier-model-at-5-of-opus-cost", "title": "MiniMax M3: Open-Weight Frontier Model at 5% of Opus Cost", "summary": "MiniMax released the M3 open-weight model, claiming it costs 5% of Claude Opus per task, achieves 59% on SWE-Bench Pro, and supports a 1-million-token context window at one-twentieth the compute of its predecessor. The model uses a new sparse attention architecture for efficient long-context processing and is available via API, OpenRouter, or self-hosted open weights. All benchmark results are vendor-published and lack independent verification.", "body_md": "MiniMax just released an open-weight model that costs roughly 5% of Claude Opus per task, hits 59% on SWE-Bench Pro, and runs a 1-million-token context window at one-twentieth the compute of its predecessor. Those numbers deserve scrutiny — they are all vendor-published — but [MiniMax M3](https://www.minimax.io/blog/minimax-m3) is now the most cost-competitive option for developers building agentic systems, and it is worth understanding how it works before deciding whether to trust it.\n\n## What Makes 1M Context Actually Practical\n\nLong context claims are common. Most models slow to a crawl or produce degraded output as they approach their limits. MiniMax M3 addresses this with a new attention architecture called MSA — MiniMax Sparse Attention — which selects relevant key-value blocks rather than computing full attention across the entire context window.\n\nThe result: at 1 million tokens, per-token compute drops to one-twentieth of the previous M2 generation, delivering 9x faster prefill and 15x faster decode at max context. MiniMax demonstrated this with a 24-hour autonomous CUDA kernel optimization task (improving hardware utilization from 7.6% to 71.3%) and a 12-hour paper reproduction task, both completed without human intervention. Those are the kinds of long-horizon workflows where a large context window actually earns its keep.\n\n## The Benchmarks: What the Numbers Say (and Don’t)\n\nMiniMax M3 scores 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, and 83.5 on BrowseComp — figures that reportedly put it above GPT-5.5 and Gemini 3.1 Pro on coding, and above Claude Opus 4.7 on autonomous browsing. The [MSA technical report](https://arxiv.org/abs/2606.13392) was published on arXiv on June 11.\n\nThe necessary caveat: every one of those results was run by MiniMax, on MiniMax infrastructure, using MiniMax agent scaffolding. Independent third-party benchmarks were not available at launch on June 1. The comparison baseline is Claude Opus 4.7, not the more recently released 4.8 — so the gap to the current frontier is larger than the launch materials suggest. Benchmark your actual workflows before committing to production.\n\n## The Cost Breakdown\n\nThe economics are where M3 makes its strongest case. Standard API pricing is $0.60 per million input tokens and $2.40 per million output tokens. A task consuming 500,000 input tokens and 100,000 output tokens runs at roughly $0.27 with M3, versus $5.00 with Claude Opus — about 5% of the cost. The promotional rate cuts that further, though you should plan against standard rates for production budgeting.\n\n| Model | Input / 1M tokens | Output / 1M tokens |\n|---|---|---|\n| MiniMax M3 (standard) | $0.60 | $2.40 |\n| MiniMax M3 (promo) | $0.30 | $1.20 |\n| Claude Opus 4.7 | $5.00 | $25.00 |\n| GPT-5.5 | ~$10.00 | ~$30.00 |\n\n## Three Ways to Use It\n\nMiniMax M3 is available through three paths depending on how much control you want:\n\n**MiniMax API** — First-party, OpenAI-compatible endpoint at `api.minimax.io`\n\n. Native multimodal support, thinking mode toggle, and the full 1M context window. This is the path with the data governance considerations discussed below.\n\n**OpenRouter** — Fastest route for testing. Set the model to `minimax/minimax-m3`\n\nusing existing OpenRouter credentials and you are running within minutes.\n\n**Self-hosted open weights** — The model is on [Hugging Face (MiniMaxAI/MiniMax-M3)](https://huggingface.co/MiniMaxAI/MiniMax-M3) with SGLang, vLLM, and Transformers support. Quantized builds for llama.cpp, Ollama, and LM Studio are available. Fair warning: the full BF16 checkpoint is a 427B-parameter model, with 23B parameters active per inference. That is a real infrastructure commitment.\n\n```\ncurl https://openrouter.ai/api/v1/chat/completions \\\n  -H \"Authorization: Bearer $OPENROUTER_API_KEY\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"model\": \"minimax/minimax-m3\", \"messages\": [{\"role\": \"user\", \"content\": \"Review this code for bugs and performance issues.\"}]}'\n```\n\n## One Question Worth Answering Before You Deploy\n\nMiniMax is a Shanghai-based company subject to China’s 2017 National Intelligence Law, which requires cooperation with state intelligence operations. For the hosted API, this is a compliance question that deserves an explicit answer depending on what you are sending through the model.\n\nSelf-hosting the [open weights](https://huggingface.co/MiniMaxAI/MiniMax-M3) eliminates this concern entirely. If your workloads involve regulated data or sensitive IP, the self-hosted path is the right call regardless of the cost story.\n\n## Bottom Line\n\nMiniMax M3 is a technically serious model with a cost structure that changes the economics of agentic workflows. It is not a drop-in replacement for Claude Opus on sensitive workloads, and its benchmark numbers need independent verification. But for long-horizon coding agents, large-document pipelines, and high-volume inference where the numbers check out on your tasks, it has earned a real evaluation — not just a footnote.\n\nRun it on one real workflow first. The [official launch post](https://www.minimax.io/blog/minimax-m3) covers the architecture in depth, and the [MSA technical report](https://arxiv.org/abs/2606.13392) has the attention mechanism specifics for those who want to go deeper. For a warts-and-all practical evaluation, the [agentic workflow evaluation on Medium](https://medium.com/@cognidownunder/i-evaluated-minimax-m3-for-agentic-workflows-the-results-are-complicated-518b60d5e6a9) is worth reading before you commit.", "url": "https://wpnews.pro/news/minimax-m3-open-weight-frontier-model-at-5-of-opus-cost", "canonical_source": "https://byteiota.com/minimax-m3-open-weight-model/", "published_at": "2026-06-18 01:08:13+00:00", "updated_at": "2026-06-18 01:27:53.712918+00:00", "lang": "en", "topics": ["large-language-models", "ai-products", "ai-infrastructure", "ai-research", "developer-tools"], "entities": ["MiniMax", "Claude Opus", "GPT-5.5", "Gemini 3.1 Pro", "OpenRouter", "Hugging Face", "SGLang", "vLLM"], "alternates": {"html": "https://wpnews.pro/news/minimax-m3-open-weight-frontier-model-at-5-of-opus-cost", "markdown": "https://wpnews.pro/news/minimax-m3-open-weight-frontier-model-at-5-of-opus-cost.md", "text": "https://wpnews.pro/news/minimax-m3-open-weight-frontier-model-at-5-of-opus-cost.txt", "jsonld": "https://wpnews.pro/news/minimax-m3-open-weight-frontier-model-at-5-of-opus-cost.jsonld"}}