Chinese models are sometimes better, even if they're distilled

wpnews.pro

cd /news/artificial-intelligence/chinese-models-are-sometimes-better-… · home › topics › artificial-intelligence › article

[ARTICLE · art-38756] src=dualuse.dev ↗ pub=2026-06-25T03:59Z topic=artificial-intelligence verified=true sentiment=· neutral

Chinese models are sometimes better, even if they're distilled

Chinese AI models GLM 5.1 and Qwen 3.6 Plus, allegedly distilled from Anthropic's Opus 4.6, outperform the original on internal cybersecurity evaluations, including real-world vulnerability exploitation at a major US bank. This challenges the White House claim that distilled models are strictly inferior, though US frontier models like Opus 4.7 and GPT 5.5 still lead overall.

read2 min views1 publishedJun 25, 2026

Chinese models are sometimes better, even if they're distilled — Image: source

Dual Use Noah Lebovic · April 24, 2026

Many folks are accusing Chinese models of distilling from American frontier labs, including the White House and Anthropic. As a part of these accusations, people often claim that distilled models are strictly inferior:

"Models developed from surreptitious, unauthorized distillation campaigns like this do not replicate the full performance of the original. They do, however, enable foreign actors to release products that appear to perform comparably on select benchmarks at a fraction of the cost."[White House memorandum from April 23rd, 2026]

This was true a few months ago, but it no longer consistently holds. GLM 5.1's cybersecurity capabilities are a good example of this.

GLM 5.1 was released a few weeks ago is alleged to be distilled from Opus 4.6. It outperformed Opus 4.6 on many public benchmarks – which is, admittedly, still in line with the White House memo. But it also outperformed Opus 4.6 on an internal cybersecurity evaluation that re-finds and exploits vulnerabilities that we've found, including an account takeover at a major American bank.

GLM 5.1

Claude Opus 4.6

Performance on ten penetration testing scenarios at 5M and 25M token budgets. A perfect score means the model autonomously found and exploited the vulnerability. See Benchmarking open-weight models for security research for methods and additional models.

This evaluation is nigh impossible to game; the model either finds the exploit or it doesn't, and the vulnerabilities are not public information. This outperfomance holds true for other Chinese models that are alleged to have distilled Opus 4.6, like Qwen 3.6 Plus: they regularly outperform the model they're alleged to have distilled from, even in real-world tasks.

Qwen 3.6 Plus

And on the point of "at a fraction of the cost", the small Qwen model – a quantized version of which can run on a laptop – also regularly outperforms Opus 4.6 in some cybersecurity scenarios.

Qwen 3.6 35B A3B

This shift happened recently. The last generation of small Qwen models could not even complete the evaluation, and the large Qwen 3.5 model vastly underperformed on this evaluation despite scoring well on public benchmarks.

Qwen 3.5 397B A17B

This set of evaluations is not representative of the full gamut of the model's abilities; the small Qwen model is not a drop-in replacement for Opus. And it is true that American labs still hold the frontier; Opus 4.7 and GPT 5.5 both outperform GLM 5.1.

Opus 4.7 (max)

Still, distilled models are not a strict subset of the teacher model's performance; they can exceed their abilities in many domains of interest, including cybersecurity.

source & further reading

dualuse.dev — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/chinese-models-are-somet…

Read original on dualuse.dev → dualuse.dev/posts/chinese-models-are-sometimes-b…

mentioned entities

GLM 5.1

Opus 4.6

Qwen 3.6 Plus

Anthropic

White House

Qwen 3.5

Opus 4.7

GPT 5.5

metadata

slugchinese-models-are-sometimes-better-even-if-they-re-distilled

topic#artificial-intelligence

secondary3 topics

sentimentneutral

canonicaldualuse.dev

navigation

← prevShow HN: Find where multi-agent …

next →Meta Pauses Employee Spyware Aft…

── more in #artificial-intelligence 4 stories · sorted by recency

letsdatascience.com · 25 Jun · #artificial-intelligence

Anthropic Replaces Amodei in White House Talks

gizmodo.com · 25 Jun · #artificial-intelligence

Anthropic’s White House Negotiations Are Reportedly On Track After ‘Weirdo’ Dario Amodei Was Replaced

byteiota.com · 25 Jun · #artificial-intelligence

Alibaba Ran 29M Fake Claude Queries to Steal AI Capabilities

boringappsec.com · 25 Jun · #artificial-intelligence

Security tools inside coding agents get ignored unless we do things

── more on @glm 5.1 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 24 Jun · #ai-policy

An AI startup is suing the US government for taking away Anthropic's new model

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required