cd /news/large-language-models/glm-5-2-open-source-750b-params-mit-… · home topics large-language-models article
[ARTICLE · art-42330] src=byteiota.com ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

GLM-5.2 Open Source: 750B Params, MIT License, 1M Context

Z.ai open-sourced GLM-5.2 on June 17 under an MIT license, a 744B-parameter sparse MoE model with a 1M-token context window that outperforms GPT-5.5 on multiple coding benchmarks while costing about one-sixth the price. The model scores 62.1 on SWE-bench Pro versus GPT-5.5's 58.6, and its API costs $2.40 per million tokens blended compared to GPT-5.5's $13.33, offering a viable open-source alternative for coding agents.

read4 min views1 publishedJun 28, 2026
GLM-5.2 Open Source: 750B Params, MIT License, 1M Context
Image: Byteiota (auto-discovered)

Z.ai open-sourced GLM-5.2 on June 17 under an MIT license — full commercial use, no royalties, no acceptable-use restrictions. The model scores 62.1 on SWE-bench Pro against GPT-5.5’s 58.6, and the API costs $2.40 per million tokens blended versus $13.33 for GPT-5.5. If you are running coding agents at OpenAI prices, you now have a real alternative you can download, self-host, and fine-tune on your own data today.

What GLM-5.2 Actually Is #

GLM-5.2 is a 744B-parameter sparse Mixture-of-Experts model — roughly 40B parameters activate per token, keeping inference costs well below what the headline number implies. It has a 1-million-token context window built for long-horizon agentic tasks: large codebase analysis, full-repo debugging, regulatory document review. Z.ai built it explicitly as a coding agent flagship, and the benchmarks back that up.

The technical feature that makes the 1M context economically viable is IndexShare — a sparse attention optimization that reuses the same token index across every four layers instead of recomputing it per layer. This cuts per-token FLOPs by 2.9x at 1M context. The result is that running a million-token prompt does not cost disproportionately more than a short one, which has historically killed long-context adoption at scale.

The Benchmark Numbers #

Here is how GLM-5.2 compares against GPT-5.5 on the benchmarks that matter for agentic work:

Benchmark GLM-5.2 GPT-5.5 Winner
SWE-bench Pro 62.1 58.6 GLM-5.2
FrontierSWE 74.4% 72.6% GLM-5.2
PostTrainBench 34.3% 25.0% GLM-5.2
MCP-Atlas (tool use) 77.0 75.3 GLM-5.2
Terminal-Bench 2.1 81.0 84.0 GPT-5.5

SWE-bench Pro tests against real GitHub issues with full repository context — not synthetic puzzles. GLM-5.2 leads on all four agentic coding benchmarks and trails only on Terminal-Bench, which skews toward general-purpose terminal tasks. For agent-driven coding specifically, GLM-5.2 now holds the lead on most open benchmarks.

The Cost Gap Is the Real Story #

GLM-5.2’s API runs at $1.40 per million input tokens and $4.40 output — blended at a 2:1 ratio, that is $2.40 per million. GPT-5.5 comes in at $5.00 input and $30.00 output, or $13.33 blended. At 100,000 requests per day on average 3,000-token prompts, that works out to $21,600 per month versus $120,000. At scale, that difference changes the economics of AI-powered products.

Self-hosting removes the per-token cost entirely. The FP8 weights are on HuggingFace at zai-org/GLM-5.2-FP8 and run on vLLM, SGLang, or transformers. You will need around 800GB of NVMe storage. The MIT license means you can fine-tune on proprietary data, run air-gapped, and commercialize the output with no royalties and no approval from Z.ai. If Z.ai changes its pricing tomorrow, your self-hosted deployment is unaffected.

huggingface-cli download zai-org/GLM-5.2-FP8 --local-dir ./glm5-2-fp8 --repo-type model

Drop-In Compatibility With Your Current Tools #

Z.ai ships an OpenAI-compatible API endpoint. If you are already using Claude Code, Cline, Roo Code, Goose, OpenCode, Crush, OpenClaw, or Kilo Code, switching to GLM-5.2 is a base-URL change in your config — no SDK swap, no code rewrite. Vercel integrated it into their AI Gateway within three days of the June 13 release. Guillermo Rauch described the coding output as “genuinely impressed, almost shocked.” A three-day turnaround from open-source release to production integration is not a normal thing.

What It Does Not Do #

GLM-5.2 has no vision support — text and code only. If your workflows depend on image input or multimodal reasoning, it is not a replacement for GPT-4o or Claude Opus 4.8 in those scenarios. The model has significant Chinese-language training data; for tasks requiring deep linguistic nuance in European languages, test it against your specific workload before committing. And self-hosting 744B parameters is not a weekend project — you need real infrastructure to support it.

The Bigger Pattern #

GLM-5.2 is the third open-source release in 18 months to genuinely close the gap with frontier proprietary models — after DeepSeek R1 for reasoning and DSpark for inference speed. Each follows the same pattern: a lab open-sources something that should not be free at that quality level, the developer community stress-tests it within days, and proprietary providers respond with price cuts. That cycle is accelerating, and GLM-5.2 makes the case that you do not need to pay premium closed-model prices to run competitive coding agents. The weights are available now.

── more in #large-language-models 4 stories · sorted by recency
── more on @z.ai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/glm-5-2-open-source-…] indexed:0 read:4min 2026-06-28 ·