# Cohere North Mini Code: 30B Open-Source Agent That Beats 120B Models

> Source: <https://byteiota.com/cohere-north-mini-code/>
> Published: 2026-06-15 15:08:31+00:00

Cohere shipped [North Mini Code](https://cohere.com/blog/north-mini-code) on June 9 — a 30 billion parameter open-source coding agent, Apache 2.0 licensed, that runs on a single H100 and outperforms models four times its size on standard coding benchmarks. If your team is paying API fees to run coding agents against proprietary models, this warrants serious attention.

## The Benchmark Numbers Are Real

North Mini Code scores 33.4 on the [Artificial Analysis Coding Index](https://kilo.ai/open-source-models) — above Nemotron 3 Super (120B) and Devstral 2 (123B). On SWE-Bench Verified (the real-world multi-file coding benchmark that actually matters), it hits 80.2% pass@10. Claude Opus 4.6 sits at 80.8%. The gap is 0.6 percentage points.

Speed tells the same story. At 210 tokens per second output with a 0.25-second time-to-first-token — against a class median of 1.95 seconds — North Mini Code ranks 8th of 127 comparable models. It also delivers 2.8x higher output throughput than Devstral Small 2 under identical hardware configurations.

The “30B beats 120B” framing is not marketing. The architecture is why.

## Why MoE Changes the Math

North Mini Code uses a sparse Mixture-of-Experts architecture: 30 billion total parameters, but only 3 billion activate per token. Of its 128 experts, just 8 fire on any given forward pass. Inference cost scales with active parameters, not total parameters — so you get near-30B-scale reasoning at roughly 3B-scale compute.

The attention design is also non-standard: interleaved sliding-window attention with RoPE and global attention without positional embeddings, in a 3:1 ratio. The model was post-trained in two stages — supervised fine-tuning followed by reinforcement learning with verifiable rewards (RLVR), which added 7.9 percentage points on Terminal-Bench v2 and 3 points on SWE-Bench Verified over the SFT-only checkpoint.

This is a well-engineered model, not a lucky benchmark result.

## How to Deploy It Today

The minimum hardware bar is one H100 at FP8. Cohere provides [weights on Hugging Face](https://huggingface.co/CohereLabs/North-Mini-Code-1.0) under Apache 2.0 — no commercial restrictions, fine-tunable, deployable anywhere. Serve it with vLLM or SGLang:

```
huggingface-cli download CohereLabs/North-Mini-Code-1.0

vllm serve CohereLabs/North-Mini-Code-1.0 \
  --dtype fp8 \
  --max-model-len 32768
```

The API is OpenAI-compatible. Drop it into any existing agentic pipeline without changing tool interfaces. For teams that need air-gapped deployment — finance, healthcare, government — download once, transfer via archive, and serve internally. No outbound traffic required.

If you don’t want to self-host, North Mini Code is also available through the [Cohere API, Model Vault, and OpenRouter](https://docs.cohere.com/docs/north-mini-code-1.0).

## The Caveat You Should Know About

In independent Artificial Analysis benchmarking, North Mini Code generated 75 million output tokens — versus a class median of 25 million. It produces three times the output of comparable models. On complex agentic tasks where quality and completeness matter, that verbosity is acceptable. In high-throughput production workloads where you’re running thousands of simple completions per hour, it will triple your token costs and add latency.

Know your use case before deploying at scale.

## Where It Fits in the Open-Source Coding Race

The June 2026 open-weight coding model landscape is genuinely competitive. Kimi K2.7-Code (Moonshot’s 1T-parameter MoE, MIT license) leads the agentic benchmarks. Devstral 2 from Mistral sits at 72.2% SWE-Bench Verified with a 123B parameter count. Qwen 3.6 from Alibaba brings 1M-token context. The gap between open-weight and proprietary models has compressed to single-digit percentage points.

North Mini Code’s differentiator is not the highest benchmark score — it’s the deployment footprint. One H100, Apache 2.0, OpenAI-compatible, 256K context. That combination gives engineering teams a credible, self-hostable alternative to Claude Fable 5 and GPT-5 for coding agent pipelines.

Cohere’s broader play here is worth noting: after acquiring Aleph Alpha, the company is positioning itself as the sovereign AI provider — the choice for organizations that need frontier-grade AI without sending their code and context to someone else’s infrastructure. North Mini Code is the developer-facing entry point for that pitch.

## The Verdict

For teams building agentic coding workflows on controlled infrastructure, North Mini Code is currently the best argument for cutting proprietary API dependency. One H100, no license headaches, OpenAI-compatible APIs, and benchmark performance that sits within a fraction of a percent of the best closed models. The verbosity tax is real — test it against your specific workload before committing — but for complex multi-file engineering tasks, it earns its token count.

The open-source coding model race is moving fast enough that the justification for paying proprietary API rates is getting harder to defend with every release. This is another data point in that direction.
