GLM 5.2 Beats Claude in Security Benchmark

wpnews.pro

cd /news/large-language-models/glm-5-2-beats-claude-in-security-ben… · home › topics › large-language-models › article

[ARTICLE · art-42746] src=devclubhouse.com ↗ pub=2026-06-28T23:04Z topic=large-language-models verified=true sentiment=↑ positive

GLM 5.2 Beats Claude in Security Benchmark

Zhipu AI's open-weight GLM 5.2 model achieved a 39% F1 score in Semgrep's IDOR detection benchmark, outperforming Claude Code's 32% and Claude Opus 4.8. The MIT-licensed model runs locally, enabling secure code analysis for organizations with compliance constraints, and its Mixture-of-Experts architecture keeps inference costs low at $0.17 per bug.

read4 min views1 publishedJun 28, 2026

GLM 5.2 Beats Claude in Security Benchmark — Image: Devclubhouse (auto-discovered)

AIArticle

Zhipu AI's open-weight model outshines proprietary giants in detecting complex access control vulnerabilities without leaking code.

Mariana Souza

Finding security vulnerabilities in code is one of the most demanding tasks we hand over to large language models. Unlike simple syntax checks, identifying logical flaws like Insecure Direct Object References (IDORs) requires a deep understanding of authorization boundaries, routing, and state. For a long time, conventional wisdom said you needed massive, proprietary frontier models behind expensive APIs to even stand a chance.

A recent benchmark from security platform Semgrep turned that assumption on its head. In a head-to-head evaluation of IDOR detection, GLM 5.2, an open-weight model from Zhipu AI, scored a 39% F1 score. This performance comfortably bypassed Claude Code, which posted a 32% F1 score, and even outpaced Claude Opus 4.8 in raw prompting scenarios.

This is a massive moment for security teams. For organizations that cannot leak proprietary codebases to external APIs due to compliance or privacy constraints, the arrival of a highly capable, MIT-licensed model that runs locally changes the math entirely.

The Architecture of GLM 5.2 #

Zhipu AI rolled out GLM 5.2 to its coding plan members on June 13, 2026, and released the open weights under an MIT license on June 16, 2026.

Under the hood, GLM 5.2 is a Mixture-of-Experts (MoE) model. It boasts roughly 750 billion total parameters, but only activates about 40 billion parameters per token. This design keeps inference costs remarkably low. In Semgrep's testing, GLM 5.2 found vulnerabilities at an estimated cost of just $0.17 per bug.

Equally important for security audits is the model's expanded context window, which now stretches to 1 million tokens, up from 200K. Security analysis is rarely self-contained. To find an IDOR, a model must trace a request from an HTTP controller, through middleware checks, down to the database query, often spanning dozens of files. Zhipu AI designed this context window to remain reliable across long, complex agent trajectories, ensuring the model does not lose the thread when parsing deeply nested codebases.

Raw Prompting vs. The Harness #

While GLM 5.2's victory over Claude Code is impressive, the benchmark highlights a critical architectural lesson: the model is only as good as the scaffolding around it.

In this evaluation, both models were tested using a basic Pydantic AI harness. They received the same IDOR prompt, a basic search strategy, and pointers on what IDORs look like, but no advanced assistance like endpoint discovery or guided navigation.

When we look at the broader picture, Semgrep's own multimodal pipeline scored between 53% and 61% F1. The difference? Semgrep's pipeline runs inside a custom harness designed specifically for static analysis. This harness does the heavy lifting: it enumerates application endpoints, prunes irrelevant code, and feeds the model only the most critical context.

xychart-beta
    title "IDOR Detection Performance (F1 Score %)"
    x-axis ["Claude Code", "GLM 5.2", "Semgrep Pipeline (Max)"]
    y-axis "F1 Score (%)" 0 --> 70
    bar [32, 39, 61]

The data shows that while a superior model provides a better baseline, building a smart, agentic harness around the model is what moves the needle from experimental to production-ready.

What This Means for Your Security Workflow #

For developers looking to adopt AI-driven security scanning, GLM 5.2 offers a compelling path forward.

First, the MIT license means you can host this model on your own infrastructure. If you are working in fintech, healthcare, or any sector with strict data sovereignty rules, sending code to external APIs is often a non-starter. Running GLM 5.2 locally solves this bottleneck.

However, hosting a 750-billion-parameter MoE model is not trivial. Even though only 40 billion parameters are active per token, you still need enough VRAM to hold the active weights and manage the massive 1-million-token context window. Teams will need to balance the infrastructure costs of running high-end GPUs against the API costs of proprietary models.

To get started, developers should avoid throwing raw code at the model in a single prompt. Instead, mimic the success of Semgrep's multimodal pipeline. Build an agentic workflow that maps out API endpoints, identifies authorization middleware, and extracts only the relevant controller code before feeding it to GLM 5.2.

The success of GLM 5.2 proves that open-weight models are no longer the underdogs in specialized, highly complex domains like cybersecurity. By combining the privacy of local execution with performance that rivals or exceeds proprietary giants, GLM 5.2 gives developers a powerful new tool to secure their codebases on their own terms.

Sources & further reading #

GLM 5.2 beats Claude in our benchmarks— semgrep.dev

Mariana Souza· Senior Editor

Mariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon.

Discussion 0 #

No comments yet

Be the first to weigh in.

source & further reading

devclubhouse.com — original article Arm at Exascale: Inside the New Number One Supercomputer The Missing Codex Ignore File and How to Work Around It Sovereign AI and the Death of the Single-API Monolith

~/api · this article 200

$curl api.wpnews.pro/v1/news/glm-5-2-beats-claude-in-…

Read original on devclubhouse.com → www.devclubhouse.com/a/glm-52-beats-claude-in-se…

mentioned entities

Zhipu AI

GLM 5.2

Claude Code

Claude Opus

Semgrep

Pydantic AI

metadata

slugglm-5-2-beats-claude-in-security-benchmark

topic#large-language-models

secondary4 topics

sentimentpositive

canonicaldevclubhouse.com

navigation

← prevThe Usefulness of AI Agents

next →Side-Stepping the Secretary Prob…

── more in #large-language-models 4 stories · sorted by recency

semgrep.dev · 28 Jun · #large-language-models

GLM 5.2 beats Claude in our benchmarks

slashdot.org · 28 Jun · #large-language-models

China's AI Matches Anthropic in Cybersecurity, Causing Worry Over US Restrictions

lesswrong.com · 28 Jun · #large-language-models

What comes with cheap math?

letsdatascience.com · 28 Jun · #large-language-models

Z.ai Matches Mythos on Cybersecurity Bug-Finding

── more on @zhipu ai 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

wpnews · 5 Jun · #ai-agents

Miasma Worm Targets AI Coding Agents via GitHub Repos

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required