{"slug": "kimi-k2-7-code-open-source-1t-coding-agent-30-fewer-thinking-tokens", "title": "Kimi K2.7-Code: Open-Source 1T Coding Agent, 30% Fewer Thinking Tokens", "summary": "Moonshot AI released Kimi K2.7-Code, an open-source 1-trillion-parameter coding agent, on June 25, claiming 30% fewer thinking tokens and higher benchmark scores than its predecessor. The model uses a Mixture-of-Experts architecture with 32 billion active parameters per forward pass, offering competitive pricing at roughly 4x cheaper than Claude Sonnet 4.6 for output tokens. However, independent verification on standard benchmarks like SWE-Bench is pending, and the model lacks a non-thinking mode, making it best suited for long-horizon agentic tasks.", "body_md": "Moonshot AI dropped Kimi K2.7-Code on June 25. It scores higher on every benchmark the company publishes and uses 30% fewer thinking tokens than its predecessor. That second part is the one worth paying attention to: in a year when every model improvement has come with a larger inference bill, K2.7-Code moves in the opposite direction.\n\nWhether that claim holds up outside Moonshot’s own test suites is a different question — one that doesn’t have an answer yet. But the efficiency angle is real, the weights are on HuggingFace under a Modified MIT license, and the API is live at rates that undercut Claude Sonnet 4.6 by a factor of four. Here’s what you actually need to know.\n\n## What Changed from K2.6\n\nK2.7-Code is a coding-specialized fine-tune on top of the same 1-trillion-parameter Mixture-of-Experts architecture that powered K2.6. The headline numbers Moonshot published:\n\n- +21.8% on Kimi Code Bench v2 (50.9 → 62.0)\n- +11.0% on Program Bench\n- +31.5% on MLS Bench Lite\n- 81.1% on MCPMark Verified tool invocation (vs Claude Opus 4.8 at 76.4%)\n\nA caveat that belongs up front: every benchmark above is a Moonshot-designed proprietary suite. There are no independent results on [SWE-Bench Verified](https://www.swebench.com/), SWE-Bench Pro, or Terminal-Bench 2.0 as of this writing. K2.6, by contrast, had public SWE-Bench Verified results (80.2%). That K2.7-Code skipped those at launch is notable, and the practitioner community has said so explicitly.\n\nTreat “21.8% better than our last model on our own coding eval” as a credible signal, not a settled fact.\n\n## The Architecture: 1 Trillion Parameters, 32 Billion Active\n\nThe efficiency story makes more sense once you understand the [Mixture-of-Experts structure](https://huggingface.co/blog/moe). K2.7-Code has 1 trillion total parameters spread across 384 expert networks, but only 8 experts fire on any given token — meaning about 32 billion parameters are active per forward pass. That’s roughly 3% of the total. The model is expensive to store but cheap to run at inference time relative to a dense 32B model.\n\nThe 30% thinking token reduction comes from post-training improvements to how the model plans its reasoning chains — not from architectural changes. Moonshot retrained the model to reach correct conclusions with fewer intermediate steps. Whether that matters in practice depends heavily on your workload.\n\n## The Catch: No Non-Thinking Mode\n\nK2.7-Code always runs with extended reasoning enabled. There is no fast path, no non-thinking mode, no way to skip the chain-of-thought. If you want quick completions, simple autocomplete, or low-latency responses for interactive coding, K2.6 is still the better option.\n\nK2.7-Code is built for the opposite use case: long-horizon agentic sessions, multi-file refactoring, complex debugging chains, and tool-use workflows that can sustain 12+ hours of continuous execution. Moonshot’s own Kimi Code plugin supports agent swarms of up to 300 parallel sub-agents on higher tiers. This is not a model you reach for to quickly generate a function — it’s a model you point at a codebase and leave running.\n\n## Cost and Access\n\nThe pricing is competitive with open-weight alternatives and aggressive relative to frontier proprietary models:\n\n| Model | Input $/M | Output $/M |\n|---|---|---|\n| K2.7-Code (Kimi API) | $0.95 | $4.00 |\n| K2.7-Code (OpenRouter) | $0.75 | $3.50 |\n| Claude Sonnet 4.6 | $3.00 | $15.00 |\n\nOn output tokens — where agent loop costs actually accumulate — K2.7-Code is roughly 4x cheaper than Sonnet 4.6. A 30% reduction in thinking tokens on top of that is a meaningful compounding effect for high-volume agentic workloads.\n\nIf you’re already using Claude Code, you can route it through Moonshot’s API with two environment variables:\n\n```\nexport ANTHROPIC_BASE_URL=\"https://api.moonshot.ai/v1\"\nexport ANTHROPIC_AUTH_TOKEN=\"your_kimi_api_key\"\n```\n\nYour Claude Code workflow stays identical; the backend switches to K2.7-Code. Retrieve your key from [platform.kimi.ai](https://platform.kimi.ai).\n\nFor self-hosting: the full INT4-quantized weights are at [moonshotai/Kimi-K2.7-Code on HuggingFace](https://huggingface.co/moonshotai/Kimi-K2.7-Code) under a Modified MIT license that permits commercial use. Realistic hardware requirement is 8x H200-class GPUs (~640GB VRAM). Community GGUF builds from Unsloth work with llama.cpp, Ollama, and LM Studio for more modest setups. Deployment uses [vLLM](https://github.com/vllm-project/vllm) 0.19.1+ or SGLang with the `kimi_k2`\n\ntool-call parser flag.\n\n## Who Should Try It Now\n\nIf you’re running cost-sensitive agent loops that currently use Sonnet 4.6 or another proprietary model, K2.7-Code is worth evaluating immediately. The price delta is large enough to justify the test even without independent benchmark validation.\n\nIf you need confidence from independent SWE-Bench or Terminal-Bench results before committing to a production switch, wait a few weeks. The community will have those numbers soon.\n\nIf your use case is anything other than long-horizon agentic coding, K2.6 or a different model is probably a better fit. K2.7-Code’s “always thinking” constraint makes it the wrong tool for fast, lightweight tasks — no matter how good the benchmark headlines look.\n\nThe open weights are there, the license is permissive, and the efficiency gains appear real. The benchmark claims need outside verification before they become gospel. Both things are true simultaneously.", "url": "https://wpnews.pro/news/kimi-k2-7-code-open-source-1t-coding-agent-30-fewer-thinking-tokens", "canonical_source": "https://byteiota.com/kimi-k2-7-code-open-source-1t-coding-agent-30-fewer-thinking-tokens/", "published_at": "2026-06-28 21:16:46+00:00", "updated_at": "2026-06-28 23:18:41.285250+00:00", "lang": "en", "topics": ["large-language-models", "ai-agents", "ai-products", "ai-tools", "ai-infrastructure"], "entities": ["Moonshot AI", "Kimi K2.7-Code", "Claude Sonnet 4.6", "HuggingFace", "OpenRouter", "Claude Code", "Anthropic"], "alternates": {"html": "https://wpnews.pro/news/kimi-k2-7-code-open-source-1t-coding-agent-30-fewer-thinking-tokens", "markdown": "https://wpnews.pro/news/kimi-k2-7-code-open-source-1t-coding-agent-30-fewer-thinking-tokens.md", "text": "https://wpnews.pro/news/kimi-k2-7-code-open-source-1t-coding-agent-30-fewer-thinking-tokens.txt", "jsonld": "https://wpnews.pro/news/kimi-k2-7-code-open-source-1t-coding-agent-30-fewer-thinking-tokens.jsonld"}}