GLM-5.2 vs Claude Opus

wpnews.pro

GLM-5.2 just came out, and it's another step forward for what open models can do.

Naturally, the internet freaked out. There's a lot of hype around it right now, and it can be hard to tell what the model actually is, how you can use it, and what it can and can't do.

This guide helps you navigate the hype. We'll show you what people are saying, the pros and the cons, then run our own vibe test pitting Claude Opus against GLM-5.2.

Here's a preview of the two games the models built. Both are browser games written from scratch, with no game engine or 3D rendering library like Three.js. The 3D models are provided by Kenney.

What Opus made

What GLM-5.2 made

What is GLM-5.2 #

GLM-5.2 is Z.ai's latest flagship model. It's open weights under an MIT license, so you can download it, run it yourself, or call it through Z.ai's API.

It's built for long-horizon tasks, the kind of long, multi-step coding-agent work that runs for hours. It ships with a 1M-token context window and two thinking effort levels, High and Max, that trade speed for capability.

GLM-5.2 is text-only, not multimodal. It can't read images, so workflows built around screenshots or diagrams still need a model like Claude Opus.

Z.ai positions it roughly between Claude Opus 4.7 and 4.8 at similar token usage. Here's their announcement, if you want to read more:

[@Zai_org on X]

Pricing and access

Because it's open weights, GLM-5.2 is cheap. Through an API it costs a fraction of Opus, and you can run it yourself for free if you have the hardware.

Pricing, per 1M tokens (vendor docs):

Input	Cache read	Output
Claude Opus 4.8	$5	$0.50	$25
GLM-5.2	$1.4	$0.26	$4.4

On output tokens, GLM-5.2 is less than a fifth the price of Opus.

The weights are on Hugging Face and ModelScope under an MIT license, with no regional restrictions. You can serve it locally with frameworks like vLLM, SGLang, or Transformers.

The benchmarks

Z.ai published these benchmark numbers alongside the release, on its model card.

Benchmark	GLM-5.2	Opus 4.8	GPT-5.5	Gemini 3.1 Pro
Reasoning
HLE	40.5	49.8*	41.4*	45
HLE (w/ tools)	54.7	57.9*	52.2*	51.4*
AIME 2026	99.2	95.7	98.3	98.2
GPQA-Diamond	91.2	93.6	93.6	94.3
IMOAnswerBench	91.0	83.5	–	81
Coding
SWE-bench Pro	62.1	69.2	58.6	54.2
NL2Repo	48.9	69.7	50.7	33.4
DeepSWE	46.2	58	70	10
ProgramBench	63.7	71.9	70.8	39.5
Terminal Bench 2.1 (Terminus-2)	81.0	85	84	74
Terminal Bench 2.1 (best harness)	82.7	78.9	83.4	70.7
SWE-Marathon	13.0	26.0	12.0	4.0
Agentic
MCP-Atlas (public)	76.8	77.8	75.3	69.2
Tool-Decathlon	48.2	59.9	55.6	48.8

An independent run by [ArtificialAnalysis](https://artificialanalysis.ai/articles/glm-5-2-is-the-new-leading-open-weights-model-on-the-artificial-analysis-intelligence-index) broadly agrees:

- Intelligence Index v4.1: 51 (leading open-weights; MiniMax-M3 44, DeepSeek V4 Pro 44, Kimi K2.6 43).
- TerminalBench v2.1: 78% (vs 81 / 82.7 on the model card — different harness).
- Output tokens per task: ~43k (GLM-5.1: 26k).

These benchmarks span three areas: reasoning (hard math and science exams), coding (fixing bugs and building whole projects), and agentic tool use (calling and chaining real tools). For what each one tests, see the benchmark notes at the end.

Navigating the hype #

It can be hard to tell what's real and what isn't online these days. So we compiled a couple of real-world examples to give you the general vibe of what people are saying about GLM-5.2.

"It keeps up with the top closed models"

This tweet compares GLM-5.2 against Claude Opus 4.8 (high), Claude Fable 5, and GPT-5.5 (high). The video shows each model rendering a 3D scene and building a few assets from scratch.

[@OmedVibeCodes on X] The takeaway people draw is that an open model now lands near the best closed models in the world.

But this is also the kind of thing that shades into astroturfing. The constraints aren't clear, and it's not obvious the task really pits the models against each other.

So treat it as a vibe, not a result. It's a basic demo that impresses on sight, with no technical scrutiny required.

A lot of what you'll see online is exactly this.

"This model is insane at design"

Another common sentiment is that it's strong at user-interface design, on par with the top closed models. This tweet had GLM-5.2 and Opus 4.8 each build a landing page.

[@nutlope on X] The two are hard to tell apart. Design is subjective, so have a look yourself.

It also flags the price: the GLM build cost $0.06 against Opus's $0.49, over six times cheaper and faster. That cheap-and-open angle is a big part of why people are hyped.

"It can't read images"

Not all the talk is positive. This tweet points out that GLM-5.2 can't read an attached image, because it isn't multimodal.

[@maria_rcks on X] Models like Claude Opus take images natively, which matters for workflows built around screenshots, diagrams, or design mockups.

We ran our own vibe test #

To cut through the vibes, we ran our own test. We gave Opus 4.8 and GLM-5.2 the same one-shot prompt: build a 3D platformer game from scratch, in raw WebGL, with no game engine or 3D library.

To finish, each model had to build:

A 3D engine and renderer in raw WebGL, no Three.js or any library.
A for the supplied 3D character and world models.
A character that runs and jumps around an arena, with gravity and collision.
A follow camera and keyboard controls.
The whole thing runnable in the browser with one command.

That stresses a few capabilities at once:

Long-horizon work: holding a layered, multi-file project together over many steps.
Hard reasoning and code taste: getting the subtle engine internals right, the parts that look fine but quietly break.
Correctness over looks: whether the rendering and physics actually work on screen, not just a pretty page.

Both got the same prompt, the same assets, and one attempt with no hints. The 3D models are free CC0 assets from Kenney.

We ran Opus 4.8 with extended thinking on high, and GLM-5.2 with thinking set to high, so both models got their full reasoning budget on the task.

How long it took, and what it cost #

Opus 4.8 built in Claude Code; GLM-5.2 built in Pi over OpenRouter. Here's how the two runs compared on time, tokens, and cost.

*Side-by-side timelapse. Opus finishes at 34:00, GLM-5.2 at 1:11.*

| Metric | Opus (Claude Code) | GLM-5.2 (Pi/OpenRouter) |
|---|---|---|

| Wall-clock build time | 33m 30s | 1h 10m 40s | | Output tokens | 216,809 | 131,000 | | Peak context window | 19% of 1M | 16% of 1M | | Tool calls | 153 | 128 |

| Cost | ~$21.92 (estimate, list pricing) | $5.39 (real billed) | Opus finished in half the time. GLM-5.2 cost a fraction as much.

Playtesting both games #

We played both games start to finish. Here's how each one held up.

Opus

Opus's game plays well.

Opus, start to finish.

From the playthrough:

The camera and controller work.
One obstacle sits off the player's path, which is a little odd.
The spike hazard kills the player, so that logic is correct.
It looks good overall, and you can reach the flag and win. There's a real win condition.

The animations look good and run smoothly, with textures applied properly.

Opus: animations, textures, controller working.

GLM-5.2

GLM-5.2's game is rougher.

GLM-5.2, start to finish.

From the playthrough:

It doesn't look as good overall.
The character is missing some of its materials.
The spike hazard doesn't kill the character.
Reaching the flag does nothing. There's no win condition.

So it's not that great. It did nail one thing, though: the spring.

GLM-5.2 spring launch.

You can jump on the spring and launch up to the next platform.

How each model checked its own work #

The task told both models to verify their own work before stopping. They differed in one way: Opus is multimodal and can read images, GLM-5.2 is text-only.

Opus could see its output

Opus is multimodal. Its verification test rendered the game and saved a screenshot "for visual confirmation," and the final result shows a clean HUD with the debug readouts cleared.

Opus's screenshot: clean HUD, debug readouts removed.

GLM-5.2 checked the numbers

GLM-5.2 is text-only. It verified through console logs and an on-screen debug readout: FPS, position, grounded state, animation, coins, deaths.

GLM-5.2's final screenshot: the debug overlay is still on. It never saw the frame.

The numbers all checked out, so it stopped. It couldn't tell the debug text was still sitting over the game.

The trade-off

A model that can read images can review its own visual output. It can catch problems that never reach the logs: leftover debug text, bad framing, a model rendered gray instead of textured.

A text-only model checks its work through numbers and console output. That covers non-visual logic, but it misses anything you have to look at, which is how GLM-5.2 left its debug overlay in the final build.

The bugs #

Both games had bugs. Here's what broke in each.

GLM-5.2

GLM-5.2's bugs were more frequent and more visible, and several were fundamentals.

The character faces the wrong way

It walks in the right direction, but the model is turned backwards the whole time.

Missing textures and a disappearing head

The character renders flat gray instead of textured, and its head vanishes whenever the camera moves.

The death spike doesn't kill

The character lands right on a spike hazard and nothing happens. No death, no reset.

Opus

Opus's were fewer and subtler, edge cases rather than broken basics.

Standing on thin air

The character can sit beside a platform, in mid-air, without falling. A collision edge case.

Winning from too far away

The win triggers while the character is still well short of the flag.

The verdict #

So, is the hype real? Mostly.

GLM-5.2 is a genuinely strong open model, at a fraction of Opus's price. For a lot of work, that combination is hard to beat.

But it isn't Opus. In our test, Opus was faster, shipped a cleaner and more correct game, and could check its own work by looking at it.

GLM-5.2 was far cheaper, but rougher, and it's text-only.

Use GLM-5.2 when cost and openness matter and the work is mostly text and logic. Use Opus when correctness, polish, and visual judgment matter, and you'll pay for it.

What the benchmarks measure #

HLE Humanity's Last Exam. Thousands of expert-level questions across many subjects, built to be extremely hard.

HLE (w/ tools) The same exam, but the model can use tools like web search and code.

AIME 2026 A hard American high-school math competition.

GPQA-Diamond Graduate-level science questions written so they can't be answered with a quick search.

IMOAnswerBench Math-olympiad-style problems, scored on the final answer.

SWE-bench Pro Fixing real issues in real codebases, often with changes across several files.

NL2Repo Building a whole, runnable codebase from a single written spec.

DeepSWE Agentic software-engineering tasks in a sandboxed container with no internet.

ProgramBench Rebuilding a full program from only its compiled binary and documentation, with no source or spec given.

Terminal Bench 2.1 Tasks completed through a real terminal. The two rows use a fixed harness (Terminus-2) and each model's best harness.

SWE-Marathon Twenty ultra-long-horizon engineering tasks, each running for hours.

MCP-Atlas Tool-use tasks run against real MCP servers, each needing several tool calls.

Tool-Decathlon Long-horizon tasks across many real apps, each needing a long chain of tool calls.

source & further reading

techstackups.com — original article Building a personal meeting assistant that routes through your existing audio Build you a personal assistant agent for fun and profit Which AI Image Generator Has the Best Character Consistency? OpenAI vs Gemini vs Black Forest Labs vs Runway (May 2026)

GLM-5.2 vs Claude Opus

What Opus made

What GLM-5.2 made

What is GLM-5.2 #

Pricing and access

The benchmarks

Navigating the hype #

"It keeps up with the top closed models"

"This model is insane at design"

"It can't read images"

We ran our own vibe test #

How long it took, and what it cost #

Playtesting both games #

Opus

GLM-5.2

How each model checked its own work #

Opus could see its output

GLM-5.2 checked the numbers

The trade-off

The bugs #

GLM-5.2

The character faces the wrong way

Missing textures and a disappearing head

The death spike doesn't kill

Opus

Standing on thin air

Winning from too far away

The verdict #

What the benchmarks measure #

Run your AI side-project on zahid.host