MAI-Thinking-1: Microsoft's New Reasoning Model and What It Means for Developers Microsoft has released MAI-Thinking-1, its first in-house reasoning model built from scratch by the Microsoft AI lab using a sparse Mixture of Experts architecture. The medium-sized model achieves 97% on the AIME 2025 math benchmark and matches Claude Opus 4.6 on SWE-Bench Pro for software engineering tasks, while operating at significantly lower inference costs than comparable dense models. Microsoft trained the model without third-party distillation, using only commercially licensed data, and built the entire training infrastructure on its own accelerators. Microsoft just shipped MAI-Thinking-1, their first in-house reasoning model. If you've been watching the AI space, you know reasoning models — the kind that "think before they answer" — have become a battleground. OpenAI has o3, Anthropic has Claude with extended thinking, Google has Gemini's thinking mode. Now Microsoft is in with their own, and they built it from the ground up rather than licensing or distilling from someone else's model. Here is what you actually need to know as a developer. MAI-Thinking-1 is Microsoft's reasoning-focused language model, developed by their internal AI lab Microsoft AI, or MAI . It is a medium-sized model designed specifically for complex, multi-step tasks — the kind of problems where a model needs to reason through multiple steps before producing an answer, rather than just pattern-matching to a response. The headline positioning is this: it is a smaller model that punches well above its weight class on software engineering and math benchmarks. The model is a sparse Mixture of Experts MoE architecture: This distinction matters for developers. In a dense model, every parameter fires for every token. In a MoE model, only a subset of "experts" activate per token, so the active compute footprint is much smaller than the total parameter count suggests. The practical result: you get near-frontier quality reasoning at a significantly lower inference cost than a comparable dense model. Compare that to something like GPT-4 class models which are estimated at 1.8T+ parameters dense , and you start to see why Microsoft is calling this "mid-weight pricing." Microsoft reports the following numbers: | Benchmark | MAI-Thinking-1 | Notes | |---|---|---| | AIME 2025 | 97.0% | Advanced math competition | | AIME 2026 | 94.5% | Most recent math competition | | SWE-Bench Pro | Competitive with Claude Opus 4.6 | Real-world software engineering tasks | | Human side-by-side | Preferred over Claude Sonnet 4.6 | Blind evaluation by Surge raters | The SWE-Bench Pro result is worth unpacking. SWE-Bench tests models on real GitHub issues — the model has to read a codebase, understand a bug report, and produce a patch that passes the existing test suite. It is arguably the most developer-relevant benchmark that exists right now. Matching Claude Opus 4.6 on this benchmark while running on far fewer active parameters is a meaningful result. The human preference eval covered 1,276 tasks across single-turn and multi-turn conversations, judged by professional raters from Surge, and prioritized whether responses actually advanced the user's goals rather than just sounding good. Microsoft made a deliberate choice that is worth understanding because it affects how the model behaves. No distillation from third-party models. Most smaller models are trained by learning to imitate a larger, more capable model this is called distillation or knowledge distillation . MAI-Thinking-1 was trained without doing this. Microsoft argues that distilled models are fundamentally bound to the design choices of their teacher model and struggle to generalize to new situations. Training from scratch on their own data means the model has to genuinely learn reasoning rather than mimicking it. Clean, licensed training data only. All pre-training data was commercially licensed, and AI-generated content was excluded from pre-training. For enterprises, this matters a lot: it affects copyright exposure and gives Microsoft better ability to explain and improve model behavior. In-house training infrastructure end-to-end. From hardware co-design on Microsoft's own accelerators to the reinforcement learning framework, the entire training stack is built internally. This is what they call the "Hill-Climbing Machine" — a system where every component can be improved independently, so capabilities improve continuously rather than requiring architectural overhauls. Before you think about API calls, here is the feature set: Context window: 256,000 tokens. That is roughly 600 pages of text. You can fit entire codebases, large contracts, or lengthy research documents in a single context. For agentic coding workflows this is essential. Function calling / tool use. Supported. If you are building agents that need to call APIs, query databases, or interact with external services, the model can handle structured tool calls in the standard format. System prompt / developer instructions. The model was trained to follow multi-layer instructions — meaning system prompts, user instructions, and constraints stack and interact predictably rather than the model silently ignoring one in favor of another. Chat Completions API compatibility. This is significant. The API uses the same interface as the widely adopted OpenAI Chat Completions format. If you already have code that calls Azure OpenAI or any OpenAI-compatible endpoint, migration should require minimal changes — primarily just swapping the model name and endpoint URL. Enterprise security via Microsoft Foundry. All MAI models come with Microsoft Foundry's compliance stack: data residency controls, audit logging, private networking options. If you are building in a regulated industry, this is the access path that gets you the compliance paperwork you need. Since the model is Chat Completions API-compatible, here is what calling it will look like once you have Foundry access. The pattern is essentially identical to calling Azure OpenAI: python import openai client = openai.AzureOpenAI azure endpoint="https://