MiniMax M3: Open-Weight Model That Beats GPT-5.5 on Coding

wpnews.pro

cd /news/large-language-models/minimax-m3-open-weight-model-that-be… · home › topics › large-language-models › article

[ARTICLE · art-45876] src=byteiota.com ↗ pub=2026-07-01T03:09Z topic=large-language-models verified=true sentiment=· neutral

MiniMax M3: Open-Weight Model That Beats GPT-5.5 on Coding

MiniMax released M3, a 428-billion-parameter open-weight model, on June 7, achieving 59.0% on SWE-Bench Pro—slightly outperforming GPT-5.5's 58.6%—at $0.30 per million input tokens, making it 16 times cheaper than GPT-5.5 and about 50 times cheaper than Claude Opus 4.8. However, MiniMax compared against Opus 4.7 rather than the newer Opus 4.8, which scores 69.2% on the same benchmark. M3 introduces MiniMax Sparse Attention for efficient long-context processing and supports image, video, and agentic tasks.

read4 min views1 publishedJul 1, 2026

MiniMax M3: Open-Weight Model That Beats GPT-5.5 on Coding — Image: Byteiota (auto-discovered)

MiniMax shipped weights on June 7 for M3, a 428-billion-parameter mixture-of-experts model that scores 59.0% on SWE-Bench Pro—edging past GPT-5.5’s 58.6%—at $0.30 per million input tokens. That’s 16 times cheaper than GPT-5.5 and about 50 times cheaper than Claude Opus 4.8. There’s a catch, though: MiniMax compared against Opus 4.7 at launch, conveniently missing Opus 4.8 (released three days earlier) which scores 69.2% on the same benchmark. M3 isn’t a clean sweep—but at that price, it doesn’t need to be.

What M3 Actually Is #

M3 runs 428 billion parameters with roughly 23 billion active per token via a MoE routing architecture. The headline innovation is MiniMax Sparse Attention (MSA)—a replacement for standard full attention that pre-filters relevant KV-cache blocks instead of attending across all tokens. At one million tokens of context, this means 9x faster prefill, 15x faster decoding, and one-twentieth the compute per token versus M2. The model also supports image and video input natively, and can operate a desktop computer in agentic tasks.

Prior long-context models could technically fit a million tokens. M3 makes it economical to actually use them. That’s the architectural bet MiniMax is making: as agents need to hold entire codebases, conversation histories, and document sets in memory simultaneously, the efficiency of the attention mechanism stops being academic. MiniMax’s technical report shows M3 autonomously reproducing a research paper in 12 hours with 18 code commits, and optimizing a CUDA kernel from 7.6% to 71.3% hardware utilization across 147 iterations.

The Benchmark Picture—Honest Version #

Here’s what the numbers actually say:

Model	SWE-Bench Pro	BrowseComp	PostTrainBench	Input ($/M tokens)
Claude Opus 4.8	69.2%	~79%	0.42 (1st)	~$15
MiniMax M3	59.0%	83.5%	0.37 (3rd)	$0.30
GPT-5.5	58.6%	N/A	0.39 (2nd)	$5.00

M3 beats GPT-5.5 on coding by a narrow margin and leads all models on BrowseComp—autonomous web agent tasks. But Opus 4.8 leads SWE-Bench by 10 points, and PostTrainBench (general instruction following) puts M3 in third. MiniMax’s own benchmarks cherry-picked Opus 4.7 as the comparison target. OpenRouter’s live latency and throughput stats give you an independent read on real-world performance.

The Price Math for Production #

For teams running AI agents in production, token costs compound fast. Consider a coding agent processing 10 million input tokens per day:

M3:$3/day** GPT-5.5:$50/day Opus 4.8:**$150/day

Over 30 days, M3 costs $90 for the same volume that costs $1,500 on GPT-5.5 or $4,500 on Opus. For high-volume bug-fix pipelines, automated code review, or eval harnesses, that gap is hard to ignore. M3 is the default choice when you need frontier-range coding ability and Opus-tier polish isn’t strictly required.

Using M3 Today #

The integration path is minimal. M3 exposes an OpenAI-compatible endpoint, so existing code changes by two lines:

from openai import OpenAI

client = OpenAI(
    base_url="https://api.minimax.io/v1",
    api_key="YOUR_MINIMAX_API_KEY",
)
response = client.chat.completions.create(
    model="MiniMax-M3",
    messages=[{"role": "user", "content": "Refactor this function..."}],
)
print(response.choices[0].message.content)

LangChain integration works identically via ChatOpenAI

pointed at the MiniMax base URL. Weights are live on Hugging Face for self-hosting. You’ll need roughly 440GB of storage for the FP8 checkpoint and at least eight high-end GPUs with tensor parallelism. SGLang has official M3 support; vLLM works with MSA support enabled. Mac Studio deployments via llama.cpp are possible—expect practical context limits below the 1M maximum.

Two Things to Know Before You Commit #

First, the license. M3 ships under the MiniMax Community License, not Apache 2.0. Commercial use restrictions may apply to your use case. Read the terms before building a product on the weights—“open weights” and “fully open source” are not the same thing.

Second, context discipline. A one-million-token window is not an invitation to stuff everything in. Every token costs money. Filling the context for tasks that don’t need it inflates cost without improving output. Use the window when the task demands it—long document analysis, full-codebase context, multi-hour agentic runs. For standard coding tasks, a shorter context at the same model delivers the same result at a fraction of the cost.

Bottom Line #

M3 is the right call for production coding agents and long-horizon agentic pipelines where Opus 4.8 is the quality ceiling but the budget says otherwise. It’s not a replacement for Opus on general instruction tasks. The benchmark cherry-picking at launch is a yellow flag—MiniMax knew the table looked cleaner without Opus 4.8 in it. But the underlying value holds: frontier-range coding at 16x lower input cost, open weights with self-hosting options, and a million-token context window that’s architecturally efficient rather than just technically possible. For cost-sensitive teams building on top of LLMs, that’s worth testing today.

source & further reading

byteiota.com — original article Claude Sonnet 5 Launches: What the Sept 1 Price Hike Means ZLUDA 6: AMD’s CUDA Alternative Loses Funding Again AWS Bets $1B on Embedded AI Engineers: What It Means

~/api · this article 200

$curl api.wpnews.pro/v1/news/minimax-m3-open-weight-m…

Read original on byteiota.com → byteiota.com/minimax-m3-open-weight-model-beats-…

mentioned entities

MiniMax

GPT-5.5

Claude Opus 4.8

OpenRouter

Hugging Face

SGLang

vLLM

llama.cpp

metadata

slugminimax-m3-open-weight-model-that-beats-gpt-5-5-on-coding

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalbyteiota.com

navigation

← prevWeb Scraping with Python in 2026…

next →Safely Releasing Frontier Models…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 1 Jul · #large-language-models

The AI Cost-Modeling Handbook: I let Claude do the modeling, but never the arithmetic

snowflake.com · 1 Jul · #large-language-models

Snowflake Marketplace Partners Earn $100M in the First Half of 2026

letsdatascience.com · 1 Jul · #large-language-models

Siteline Finds AI Agents Misread B2B Pricing

cryptobriefing.com · 1 Jul · #large-language-models

Primitive introduces ROAC metric for AI performance in banking sector

── more on @minimax 3 stories trending now

wpnews · 30 May · #ai-tools

I was wasting 10 minutes every Claude session. So I built a fix.

wpnews · 27 May · #machine-learning

hunting for headroom on modded-nanoGPT (WR #82)

wpnews · 2 Jun · #ai-products

Microsoft launches Discovery platform for scientific R&D with Ginkgo Bioworks partnership

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required