cd /news/artificial-intelligence/chinese-models-are-sometimes-better-… · home topics artificial-intelligence article
[ARTICLE · art-38756] src=dualuse.dev ↗ pub= topic=artificial-intelligence verified=true sentiment=· neutral

Chinese models are sometimes better, even if they're distilled

Chinese AI models GLM 5.1 and Qwen 3.6 Plus, allegedly distilled from Anthropic's Opus 4.6, outperform the original on internal cybersecurity evaluations, including real-world vulnerability exploitation at a major US bank. This challenges the White House claim that distilled models are strictly inferior, though US frontier models like Opus 4.7 and GPT 5.5 still lead overall.

read2 min views1 publishedJun 25, 2026
Chinese models are sometimes better, even if they're distilled
Image: source

Dual Use Noah Lebovic · April 24, 2026

Many folks are accusing Chinese models of distilling from American frontier labs, including the White House and Anthropic. As a part of these accusations, people often claim that distilled models are strictly inferior:

"Models developed from surreptitious, unauthorized distillation campaigns like this do not replicate the full performance of the original. They do, however, enable foreign actors to release products that appear to perform comparably on select benchmarks at a fraction of the cost."[White House memorandum from April 23rd, 2026]

This was true a few months ago, but it no longer consistently holds. GLM 5.1's cybersecurity capabilities are a good example of this.

GLM 5.1 was released a few weeks ago is alleged to be distilled from Opus 4.6. It outperformed Opus 4.6 on many public benchmarks – which is, admittedly, still in line with the White House memo. But it also outperformed Opus 4.6 on an internal cybersecurity evaluation that re-finds and exploits vulnerabilities that we've found, including an account takeover at a major American bank.

GLM 5.1

Claude Opus 4.6

Performance on ten penetration testing scenarios at 5M and 25M token budgets. A perfect score means the model autonomously found and exploited the vulnerability. See Benchmarking open-weight models for security research for methods and additional models.

This evaluation is nigh impossible to game; the model either finds the exploit or it doesn't, and the vulnerabilities are not public information. This outperfomance holds true for other Chinese models that are alleged to have distilled Opus 4.6, like Qwen 3.6 Plus: they regularly outperform the model they're alleged to have distilled from, even in real-world tasks.

Qwen 3.6 Plus

And on the point of "at a fraction of the cost", the small Qwen model – a quantized version of which can run on a laptop – also regularly outperforms Opus 4.6 in some cybersecurity scenarios.

Qwen 3.6 35B A3B

This shift happened recently. The last generation of small Qwen models could not even complete the evaluation, and the large Qwen 3.5 model vastly underperformed on this evaluation despite scoring well on public benchmarks.

Qwen 3.5 397B A17B

This set of evaluations is not representative of the full gamut of the model's abilities; the small Qwen model is not a drop-in replacement for Opus. And it is true that American labs still hold the frontier; Opus 4.7 and GPT 5.5 both outperform GLM 5.1.

Opus 4.7 (max)

Still, distilled models are not a strict subset of the teacher model's performance; they can exceed their abilities in many domains of interest, including cybersecurity.

── more in #artificial-intelligence 4 stories · sorted by recency
── more on @glm 5.1 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/chinese-models-are-s…] indexed:0 read:2min 2026-06-25 ·