Kimi K2.7-Code: Open-Source 1T Coding Agent, 30% Fewer Thinking Tokens

wpnews.pro

cd /news/large-language-models/kimi-k2-7-code-open-source-1t-coding… · home › topics › large-language-models › article

[ARTICLE · art-42720] src=byteiota.com ↗ pub=2026-06-28T21:16Z topic=large-language-models verified=true sentiment=· neutral

Kimi K2.7-Code: Open-Source 1T Coding Agent, 30% Fewer Thinking Tokens

Moonshot AI released Kimi K2.7-Code, an open-source 1-trillion-parameter coding agent, on June 25, claiming 30% fewer thinking tokens and higher benchmark scores than its predecessor. The model uses a Mixture-of-Experts architecture with 32 billion active parameters per forward pass, offering competitive pricing at roughly 4x cheaper than Claude Sonnet 4.6 for output tokens. However, independent verification on standard benchmarks like SWE-Bench is pending, and the model lacks a non-thinking mode, making it best suited for long-horizon agentic tasks.

read4 min views1 publishedJun 28, 2026

Kimi K2.7-Code: Open-Source 1T Coding Agent, 30% Fewer Thinking Tokens — Image: Byteiota (auto-discovered)

Moonshot AI dropped Kimi K2.7-Code on June 25. It scores higher on every benchmark the company publishes and uses 30% fewer thinking tokens than its predecessor. That second part is the one worth paying attention to: in a year when every model improvement has come with a larger inference bill, K2.7-Code moves in the opposite direction.

Whether that claim holds up outside Moonshot’s own test suites is a different question — one that doesn’t have an answer yet. But the efficiency angle is real, the weights are on HuggingFace under a Modified MIT license, and the API is live at rates that undercut Claude Sonnet 4.6 by a factor of four. Here’s what you actually need to know.

What Changed from K2.6 #

K2.7-Code is a coding-specialized fine-tune on top of the same 1-trillion-parameter Mixture-of-Experts architecture that powered K2.6. The headline numbers Moonshot published:

+21.8% on Kimi Code Bench v2 (50.9 → 62.0)
+11.0% on Program Bench
+31.5% on MLS Bench Lite
81.1% on MCPMark Verified tool invocation (vs Claude Opus 4.8 at 76.4%)

A caveat that belongs up front: every benchmark above is a Moonshot-designed proprietary suite. There are no independent results on SWE-Bench Verified, SWE-Bench Pro, or Terminal-Bench 2.0 as of this writing. K2.6, by contrast, had public SWE-Bench Verified results (80.2%). That K2.7-Code skipped those at launch is notable, and the practitioner community has said so explicitly.

Treat “21.8% better than our last model on our own coding eval” as a credible signal, not a settled fact.

The Architecture: 1 Trillion Parameters, 32 Billion Active #

The efficiency story makes more sense once you understand the Mixture-of-Experts structure. K2.7-Code has 1 trillion total parameters spread across 384 expert networks, but only 8 experts fire on any given token — meaning about 32 billion parameters are active per forward pass. That’s roughly 3% of the total. The model is expensive to store but cheap to run at inference time relative to a dense 32B model.

The 30% thinking token reduction comes from post-training improvements to how the model plans its reasoning chains — not from architectural changes. Moonshot retrained the model to reach correct conclusions with fewer intermediate steps. Whether that matters in practice depends heavily on your workload.

The Catch: No Non-Thinking Mode #

K2.7-Code always runs with extended reasoning enabled. There is no fast path, no non-thinking mode, no way to skip the chain-of-thought. If you want quick completions, simple autocomplete, or low-latency responses for interactive coding, K2.6 is still the better option.

K2.7-Code is built for the opposite use case: long-horizon agentic sessions, multi-file refactoring, complex debugging chains, and tool-use workflows that can sustain 12+ hours of continuous execution. Moonshot’s own Kimi Code plugin supports agent swarms of up to 300 parallel sub-agents on higher tiers. This is not a model you reach for to quickly generate a function — it’s a model you point at a codebase and leave running.

Cost and Access #

The pricing is competitive with open-weight alternatives and aggressive relative to frontier proprietary models:

Model	Input $/M	Output $/M
K2.7-Code (Kimi API)	$0.95	$4.00
K2.7-Code (OpenRouter)	$0.75	$3.50
Claude Sonnet 4.6	$3.00	$15.00

On output tokens — where agent loop costs actually accumulate — K2.7-Code is roughly 4x cheaper than Sonnet 4.6. A 30% reduction in thinking tokens on top of that is a meaningful compounding effect for high-volume agentic workloads.

If you’re already using Claude Code, you can route it through Moonshot’s API with two environment variables:

export ANTHROPIC_BASE_URL="https://api.moonshot.ai/v1"
export ANTHROPIC_AUTH_TOKEN="your_kimi_api_key"

Your Claude Code workflow stays identical; the backend switches to K2.7-Code. Retrieve your key from platform.kimi.ai.

For self-hosting: the full INT4-quantized weights are at moonshotai/Kimi-K2.7-Code on HuggingFace under a Modified MIT license that permits commercial use. Realistic hardware requirement is 8x H200-class GPUs (~640GB VRAM). Community GGUF builds from Unsloth work with llama.cpp, Ollama, and LM Studio for more modest setups. Deployment uses vLLM 0.19.1+ or SGLang with the kimi_k2

tool-call parser flag.

Who Should Try It Now #

If you’re running cost-sensitive agent loops that currently use Sonnet 4.6 or another proprietary model, K2.7-Code is worth evaluating immediately. The price delta is large enough to justify the test even without independent benchmark validation.

If you need confidence from independent SWE-Bench or Terminal-Bench results before committing to a production switch, wait a few weeks. The community will have those numbers soon.

If your use case is anything other than long-horizon agentic coding, K2.6 or a different model is probably a better fit. K2.7-Code’s “always thinking” constraint makes it the wrong tool for fast, lightweight tasks — no matter how good the benchmark headlines look.

The open weights are there, the license is permissive, and the efficiency gains appear real. The benchmark claims need outside verification before they become gospel. Both things are true simultaneously.

source & further reading

byteiota.com — original article OpenCode v1.17: Session Snapshots Undo Your AI Agent Qualcomm’s $3.9B Modular Buy: Mojo Takes Aim at CUDA Grok Build /goal: xAI’s Coding Agent Now Runs Until the Job Is Done

~/api · this article 200

$curl api.wpnews.pro/v1/news/kimi-k2-7-code-open-sour…

Read original on byteiota.com → byteiota.com/kimi-k2-7-code-open-source-1t-codin…

mentioned entities

Moonshot AI

Kimi K2.7-Code

Claude Sonnet 4.6

HuggingFace

OpenRouter

Claude Code

Anthropic

metadata

slugkimi-k2-7-code-open-source-1t-coding-agent-30-fewer-thinking-tokens

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalbyteiota.com

navigation

← prevElon Musk says Grok 4.5 is in pr…

next →Vincenzo's NanoEuler rebuilds a …

── more in #large-language-models 4 stories · sorted by recency

letsdatascience.com · 28 Jun · #large-language-models

Woolworths remakes Everyday chatbot into agentic assistant

techcrunch.com · 28 Jun · #large-language-models

Ford rehires ‘gray beard’ engineers after AI falls short

byteiota.com · 28 Jun · #large-language-models

LLM Model Routing in 2026: Cut AI Costs 70% With Smart Model Selection

runtimewire.com · 28 Jun · #large-language-models

Elon Musk says Grok 4.5 is in private beta at SpaceX and Tesla

── more on @moonshot ai 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

wpnews · 5 Jun · #ai-agents

Miasma Worm Targets AI Coding Agents via GitHub Repos

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required