Z.ai Releases GLM-5.2 With 1M-Token Context

Z.ai released GLM-5.2, an open-weights flagship model with 753 billion parameters and a 1,000,000-token context, designed for long-horizon coding tasks. The model introduces IndexShare, reducing per-token FLOPs by 2.9× at full context, and scored 81.0 on Terminal-Bench 2.1, challenging proprietary models like Claude Opus 4.8. Core weights are MIT-licensed on Hugging Face, with wider availability starting June 16.

Z.ai Releases GLM-5.2 With 1M-Token Context Per Z.ai's public repository, GLM-5.2 is an open-weights flagship model designed for long-horizon coding tasks and supports a 1,000,000-token context Z.ai GitHub . VentureBeat reports the model has 753 billion parameters and introduces an architectural optimization called IndexShare that reduces per-token FLOPs by 2.9× at the 1M context length VentureBeat; Z.ai GitHub . Z.ai published MIT-licensed core weights on Hugging Face and made the model available to Coding Plan subscribers on June 13, with wider releases and benchmarks arriving June 16, according to DigitalApplied and VentureBeat. Multiple outlets report benchmark results: GLM-5.2 scored 81.0 on Terminal-Bench 2.1 versus 85.0 for Claude Opus 4.8, and coverage notes it challenges proprietary models on long-horizon coding workloads Z.ai GitHub; Computerworld; VentureBeat . What happened Per Z.ai's GitHub repository, GLM-5.2 is the lab's new flagship model for long-horizon tasks and supports a 1,000,000-token context Z.ai GitHub . VentureBeat reports the model contains 753 billion parameters and that Z.ai published the core weights under an MIT license on Hugging Face, enabling unrestricted commercial modification and redistribution VentureBeat; Hugging Face listing; Z.ai GitHub . DigitalApplied documents the release sequence: the model went live to GLM Coding Plan subscribers on June 13, with the standalone API, open weights, and benchmark results published around June 16 DigitalApplied . Technical details Per Z.ai's documentation, GLM-5.2 introduces an architectural technique called IndexShare , which reuses a single indexer across every four sparse-attention layers and reportedly reduces per-token compute by 2.9× at the 1M-token context length Z.ai GitHub; VentureBeat . The repo and press coverage also highlight an improved Multi-Token Prediction MTP layer that increases the accepted length for speculative decoding by up to 20% Z.ai GitHub; VentureBeat . Z.ai's published scorecard lists GLM-5.2 at 81.0 on Terminal-Bench 2.1; Z.ai's materials compare that result to 85.0 for Claude Opus 4.8 on the same benchmark Z.ai GitHub; Computerworld . Industry context Editorial analysis: Companies releasing large-context, open-weight models create practical options for enterprises that prioritize local hosting, customization, or regulatory resilience. Open licensing plus a 1M-token context materially lowers the friction for repository-scale engineering workflows, according to vendor publications and platform listings VentureBeat; Hugging Face; Computerworld . Comparative performance and cost framing Reporting by VentureBeat frames GLM-5.2 as competitive with closed-source frontier models on long-horizon coding benchmarks while offering a different cost and deployment trade-off because the weights are open and the architecture is optimized for low per-token FLOPs VentureBeat . Computerworld and Z.ai's repository material emphasize that GLM-5.2 ranks close to Anthropic's Claude Opus 4.8 on FrontierSWE/Terminal-Bench metrics and that the model edges some proprietary models on selected long-horizon coding benchmarks Computerworld; Z.ai GitHub . What this means for practitioners For practitioners: Open weights with an MIT license plus documented 1M-token context shifts the engineering trade-offs for toolchains that must reason across large codebases or long sessions. Teams evaluating repository-scale agents will now be able to benchmark a frontier-capability model locally or in private cloud instances without vendor API constraints, per public availability on Hugging Face and provider integration notes Hugging Face; VentureBeat; Fireworks.ai announcement . Limitations and rollout notes Per DigitalApplied's coverage, Z.ai's initial distribution prioritized Coding Plan subscribers before publishing independent benchmarks, so early availability preceded broad third-party validation DigitalApplied . Observed benchmark numbers come from Z.ai's published scorecards and platform rankings; independent, peer-reviewed evaluations are limited at time of publication Z.ai GitHub; Arena board reports cited by DigitalApplied . What to watch For practitioners: follow independent benchmark replications on Terminal-Bench 2.1 and FrontierSWE, third-party evaluations of long-horizon stability under adversarial prompts, and adoption reports from inference platform partners Arena, Hugging Face, third-party inference providers . Also monitor tooling support for 1M-token contexts in popular agent frameworks and the practical memory/latency trade-offs on real-world hardware when using GLM-5.2 at scale. Bottom line Per multiple vendor documents and trade press, GLM-5.2 is an open-weight, MIT-licensed model with a 1M-token context and architectural optimizations that materially reduce per-token compute at extreme context lengths; early benchmarks place it close to the closed-source frontier on long-horizon coding tasks, while independent replication and production-scale metrics remain the immediate next steps for practitioners Z.ai GitHub; VentureBeat; Computerworld; DigitalApplied . Scoring Rationale An MIT-licensed, open-weights model with a stable 1M-token context and competitive long-horizon coding benchmarks materially affects deployment options and cost calculus for engineering teams. The release is industry-significant but still needs independent replication and production metrics. Practice interview problems based on real data 1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with. Try 250 free problems /problems