cd /news/large-language-models/kimi-k2-7-code-open-source-1t-coding… · home topics large-language-models article
[ARTICLE · art-42720] src=byteiota.com ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Kimi K2.7-Code: Open-Source 1T Coding Agent, 30% Fewer Thinking Tokens

Moonshot AI released Kimi K2.7-Code, an open-source 1-trillion-parameter coding agent, on June 25, claiming 30% fewer thinking tokens and higher benchmark scores than its predecessor. The model uses a Mixture-of-Experts architecture with 32 billion active parameters per forward pass, offering competitive pricing at roughly 4x cheaper than Claude Sonnet 4.6 for output tokens. However, independent verification on standard benchmarks like SWE-Bench is pending, and the model lacks a non-thinking mode, making it best suited for long-horizon agentic tasks.

read4 min views1 publishedJun 28, 2026
Kimi K2.7-Code: Open-Source 1T Coding Agent, 30% Fewer Thinking Tokens
Image: Byteiota (auto-discovered)

Moonshot AI dropped Kimi K2.7-Code on June 25. It scores higher on every benchmark the company publishes and uses 30% fewer thinking tokens than its predecessor. That second part is the one worth paying attention to: in a year when every model improvement has come with a larger inference bill, K2.7-Code moves in the opposite direction.

Whether that claim holds up outside Moonshot’s own test suites is a different question — one that doesn’t have an answer yet. But the efficiency angle is real, the weights are on HuggingFace under a Modified MIT license, and the API is live at rates that undercut Claude Sonnet 4.6 by a factor of four. Here’s what you actually need to know.

What Changed from K2.6 #

K2.7-Code is a coding-specialized fine-tune on top of the same 1-trillion-parameter Mixture-of-Experts architecture that powered K2.6. The headline numbers Moonshot published:

  • +21.8% on Kimi Code Bench v2 (50.9 → 62.0)
  • +11.0% on Program Bench
  • +31.5% on MLS Bench Lite
  • 81.1% on MCPMark Verified tool invocation (vs Claude Opus 4.8 at 76.4%)

A caveat that belongs up front: every benchmark above is a Moonshot-designed proprietary suite. There are no independent results on SWE-Bench Verified, SWE-Bench Pro, or Terminal-Bench 2.0 as of this writing. K2.6, by contrast, had public SWE-Bench Verified results (80.2%). That K2.7-Code skipped those at launch is notable, and the practitioner community has said so explicitly.

Treat “21.8% better than our last model on our own coding eval” as a credible signal, not a settled fact.

The Architecture: 1 Trillion Parameters, 32 Billion Active #

The efficiency story makes more sense once you understand the Mixture-of-Experts structure. K2.7-Code has 1 trillion total parameters spread across 384 expert networks, but only 8 experts fire on any given token — meaning about 32 billion parameters are active per forward pass. That’s roughly 3% of the total. The model is expensive to store but cheap to run at inference time relative to a dense 32B model.

The 30% thinking token reduction comes from post-training improvements to how the model plans its reasoning chains — not from architectural changes. Moonshot retrained the model to reach correct conclusions with fewer intermediate steps. Whether that matters in practice depends heavily on your workload.

The Catch: No Non-Thinking Mode #

K2.7-Code always runs with extended reasoning enabled. There is no fast path, no non-thinking mode, no way to skip the chain-of-thought. If you want quick completions, simple autocomplete, or low-latency responses for interactive coding, K2.6 is still the better option.

K2.7-Code is built for the opposite use case: long-horizon agentic sessions, multi-file refactoring, complex debugging chains, and tool-use workflows that can sustain 12+ hours of continuous execution. Moonshot’s own Kimi Code plugin supports agent swarms of up to 300 parallel sub-agents on higher tiers. This is not a model you reach for to quickly generate a function — it’s a model you point at a codebase and leave running.

Cost and Access #

The pricing is competitive with open-weight alternatives and aggressive relative to frontier proprietary models:

Model Input $/M Output $/M
K2.7-Code (Kimi API) $0.95 $4.00
K2.7-Code (OpenRouter) $0.75 $3.50
Claude Sonnet 4.6 $3.00 $15.00

On output tokens — where agent loop costs actually accumulate — K2.7-Code is roughly 4x cheaper than Sonnet 4.6. A 30% reduction in thinking tokens on top of that is a meaningful compounding effect for high-volume agentic workloads.

If you’re already using Claude Code, you can route it through Moonshot’s API with two environment variables:

export ANTHROPIC_BASE_URL="https://api.moonshot.ai/v1"
export ANTHROPIC_AUTH_TOKEN="your_kimi_api_key"

Your Claude Code workflow stays identical; the backend switches to K2.7-Code. Retrieve your key from platform.kimi.ai.

For self-hosting: the full INT4-quantized weights are at moonshotai/Kimi-K2.7-Code on HuggingFace under a Modified MIT license that permits commercial use. Realistic hardware requirement is 8x H200-class GPUs (~640GB VRAM). Community GGUF builds from Unsloth work with llama.cpp, Ollama, and LM Studio for more modest setups. Deployment uses vLLM 0.19.1+ or SGLang with the kimi_k2

tool-call parser flag.

Who Should Try It Now #

If you’re running cost-sensitive agent loops that currently use Sonnet 4.6 or another proprietary model, K2.7-Code is worth evaluating immediately. The price delta is large enough to justify the test even without independent benchmark validation.

If you need confidence from independent SWE-Bench or Terminal-Bench results before committing to a production switch, wait a few weeks. The community will have those numbers soon.

If your use case is anything other than long-horizon agentic coding, K2.6 or a different model is probably a better fit. K2.7-Code’s “always thinking” constraint makes it the wrong tool for fast, lightweight tasks — no matter how good the benchmark headlines look.

The open weights are there, the license is permissive, and the efficiency gains appear real. The benchmark claims need outside verification before they become gospel. Both things are true simultaneously.

── more in #large-language-models 4 stories · sorted by recency
── more on @moonshot ai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/kimi-k2-7-code-open-…] indexed:0 read:4min 2026-06-28 ·