# MiniMax M3: Open-Weight Frontier Model at 5% of Opus Cost

> Source: <https://byteiota.com/minimax-m3-open-weight-model/>
> Published: 2026-06-18 01:08:13+00:00

MiniMax just released an open-weight model that costs roughly 5% of Claude Opus per task, hits 59% on SWE-Bench Pro, and runs a 1-million-token context window at one-twentieth the compute of its predecessor. Those numbers deserve scrutiny — they are all vendor-published — but [MiniMax M3](https://www.minimax.io/blog/minimax-m3) is now the most cost-competitive option for developers building agentic systems, and it is worth understanding how it works before deciding whether to trust it.

## What Makes 1M Context Actually Practical

Long context claims are common. Most models slow to a crawl or produce degraded output as they approach their limits. MiniMax M3 addresses this with a new attention architecture called MSA — MiniMax Sparse Attention — which selects relevant key-value blocks rather than computing full attention across the entire context window.

The result: at 1 million tokens, per-token compute drops to one-twentieth of the previous M2 generation, delivering 9x faster prefill and 15x faster decode at max context. MiniMax demonstrated this with a 24-hour autonomous CUDA kernel optimization task (improving hardware utilization from 7.6% to 71.3%) and a 12-hour paper reproduction task, both completed without human intervention. Those are the kinds of long-horizon workflows where a large context window actually earns its keep.

## The Benchmarks: What the Numbers Say (and Don’t)

MiniMax M3 scores 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, and 83.5 on BrowseComp — figures that reportedly put it above GPT-5.5 and Gemini 3.1 Pro on coding, and above Claude Opus 4.7 on autonomous browsing. The [MSA technical report](https://arxiv.org/abs/2606.13392) was published on arXiv on June 11.

The necessary caveat: every one of those results was run by MiniMax, on MiniMax infrastructure, using MiniMax agent scaffolding. Independent third-party benchmarks were not available at launch on June 1. The comparison baseline is Claude Opus 4.7, not the more recently released 4.8 — so the gap to the current frontier is larger than the launch materials suggest. Benchmark your actual workflows before committing to production.

## The Cost Breakdown

The economics are where M3 makes its strongest case. Standard API pricing is $0.60 per million input tokens and $2.40 per million output tokens. A task consuming 500,000 input tokens and 100,000 output tokens runs at roughly $0.27 with M3, versus $5.00 with Claude Opus — about 5% of the cost. The promotional rate cuts that further, though you should plan against standard rates for production budgeting.

| Model | Input / 1M tokens | Output / 1M tokens |
|---|---|---|
| MiniMax M3 (standard) | $0.60 | $2.40 |
| MiniMax M3 (promo) | $0.30 | $1.20 |
| Claude Opus 4.7 | $5.00 | $25.00 |
| GPT-5.5 | ~$10.00 | ~$30.00 |

## Three Ways to Use It

MiniMax M3 is available through three paths depending on how much control you want:

**MiniMax API** — First-party, OpenAI-compatible endpoint at `api.minimax.io`

. Native multimodal support, thinking mode toggle, and the full 1M context window. This is the path with the data governance considerations discussed below.

**OpenRouter** — Fastest route for testing. Set the model to `minimax/minimax-m3`

using existing OpenRouter credentials and you are running within minutes.

**Self-hosted open weights** — The model is on [Hugging Face (MiniMaxAI/MiniMax-M3)](https://huggingface.co/MiniMaxAI/MiniMax-M3) with SGLang, vLLM, and Transformers support. Quantized builds for llama.cpp, Ollama, and LM Studio are available. Fair warning: the full BF16 checkpoint is a 427B-parameter model, with 23B parameters active per inference. That is a real infrastructure commitment.

```
curl https://openrouter.ai/api/v1/chat/completions \
  -H "Authorization: Bearer $OPENROUTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model": "minimax/minimax-m3", "messages": [{"role": "user", "content": "Review this code for bugs and performance issues."}]}'
```

## One Question Worth Answering Before You Deploy

MiniMax is a Shanghai-based company subject to China’s 2017 National Intelligence Law, which requires cooperation with state intelligence operations. For the hosted API, this is a compliance question that deserves an explicit answer depending on what you are sending through the model.

Self-hosting the [open weights](https://huggingface.co/MiniMaxAI/MiniMax-M3) eliminates this concern entirely. If your workloads involve regulated data or sensitive IP, the self-hosted path is the right call regardless of the cost story.

## Bottom Line

MiniMax M3 is a technically serious model with a cost structure that changes the economics of agentic workflows. It is not a drop-in replacement for Claude Opus on sensitive workloads, and its benchmark numbers need independent verification. But for long-horizon coding agents, large-document pipelines, and high-volume inference where the numbers check out on your tasks, it has earned a real evaluation — not just a footnote.

Run it on one real workflow first. The [official launch post](https://www.minimax.io/blog/minimax-m3) covers the architecture in depth, and the [MSA technical report](https://arxiv.org/abs/2606.13392) has the attention mechanism specifics for those who want to go deeper. For a warts-and-all practical evaluation, the [agentic workflow evaluation on Medium](https://medium.com/@cognidownunder/i-evaluated-minimax-m3-for-agentic-workflows-the-results-are-complicated-518b60d5e6a9) is worth reading before you commit.
