cd /news/large-language-models/was-glm-5-2-trained-on-opus-4-5-outp… · home topics large-language-models article
[ARTICLE · art-40824] src=1chat.com ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Was GLM-5.2 trained on Opus 4.5 outputs?

Evidence suggests GLM-5.2, an open-weight MoE LLM performing at Claude Opus 4.5 level, was likely trained using outputs from Anthropic's Claude models. LLM-Fingerprinter identifies Claude Opus 4.5 with 99.6% confidence, and slop forensics shows GLM-5.2's writing style closely matches Claude Opus 4.5. If distillation occurred, future open-weight frontier models may be threatened as labs increase detection measures.

read3 min views1 publishedJun 26, 2026
Was GLM-5.2 trained on Opus 4.5 outputs?
Image: source

Recently there is a lot of excitement about GLM-5.2 which is an open-weight MoE LLM performing on Claude Opus 4.5 level in chat arena and overperforming all models except Claude Fable in WebDev arena [1].

Even though it is very good that this level of intelligence is now public, it is important to know whether it was achieved independently or by distilling other models.While some may be concerned with ethics or legality of distilling, and others may point that Anthropic/OpenAI/Google themselves distill a lot of public and non-public human knowledge without much asking[2][3], we don’t make this judgement here.What we think is important is that If it was distilled, then what we are enjoying right now with the GLM-5.2 is temporary, as the source labs will add even more measures detecting distillation, KYC, etc., which will eventually prevent creation of new open-weight frontier-level models.

So what evidence do we have that GLM-5.2 was trained using another model?

Many users on X noticed that the way GLM-5.2 reasons and answers is similar in style to Claude Opus.

One approach to detect if a model A is related to another model B is to train a classifier on models B,C,D,E,F,G using outputs of a set of carefully selected prompts, then feed the same prompts to the model A and see which model the trained classifier thinks it is.

LLM-Fingerprinter[4] does exactly that. It uses 31 prompts across 3 layers (discriminative → behavioral → stylistic):

Discriminative (11): Identity, knowledge cutoff, architecture, reasoning

Behavioral (7): Safety boundaries, jailbreak resistance, honesty, policy handling

Stylistic (13): Formatting, creativity, constraint following, default voice

We have trained LLM-Fingerprinter on the following models, all selected so that they were current at Fall 2025 (except Grok), when we think GLM was preparing data for a GLM-5 training run:

"anthropic/claude-opus-4.5" # 2025/11/24"openai/gpt-5.1" # 2025/11/13"google/gemini-2.5-pro" # 2025/06/17"meta-llama/llama-3.3-70b-instruct" # 2024/12/06"meta-llama/llama-4-maverick" # 2025/04/05"x-ai/grok-4.20" # 2026/03/31"qwen/qwen3-vl-32b-instruct" # 2025/10/23"mistralai/ministral-14b-2512" # 2025/12/02"deepseek/deepseek-chat-v3.1" # 2025/08/21

Given these 9 choices, LLM-Fingerprinter choses anthropic/claude-opus-4.5 with 99.6% confidence level.

Another way of comparing model similarity is to take outputs of a model and see what words, phrases and bigrams/trigrams it uses more often than others. Two models using the same uncommon phrases may suggest a relation.

Slop Forensics[5] Toolkit by Samuel Paech uses that insight to build phylogenetic trees of LLMs based on their “slop profile”.

On “creative writing” outputs, it puts GLM-5.2 as a close relative of claude-opus-4-5-20251101: Their slop profiles are similar, though not identical: GLM-5.2 has a distance of 0.767 from Opus 4.5 and 0.765 from Opus 4.8, implying that only about 23% of slop terms overlap. For comparison, the distance between Opus 4.6 and Opus 4.5 is 0.775, which makes GLM-5.2 slightly closer to Opus 4.5 than Opus 4.6 is.

Given the observations above, can we say that GLM-5.2 was trained to at least imitate Claude Opus 4.5’s response style, likely by using Claude models to generate part of GLM’s synthetic training data?

Very likely, yes: some of the input data appears to have been steered by Claude.

However, this does not mean that all of GLM-5.2’s capabilities were taken from Claude.

At a minimum, Z.ai still had to carefully choose the model architecture, build reinforcement learning environments and data-curation pipelines, develop infrastructure capable of training on hundreds of thousands of GPUs, and ultimately train a model that achieves excellent results - surpassing many other labs that are also competing intensely.

[1] - https://arena.ai/leaderboard/code/webdev [2] - https://apnews.com/article/anthropic-copyright-authors-settlement-training-f294266bc79a16ec90d2ddccdf435164

[3] - [https://www.theverge.com/2023/7/5/23784257/google-ai-bard-privacy-policy-train-web-scraping](https://www.theverge.com/2023/7/5/23784257/google-ai-bard-privacy-policy-train-web-scraping)

[4] - [https://github.com/litemars/LLM-Fingerprinter](https://github.com/litemars/LLM-Fingerprinter)

[5] - [https://github.com/sam-paech/slop-forensics](https://github.com/sam-paech/slop-forensics)
── more in #large-language-models 4 stories · sorted by recency
── more on @glm-5.2 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/was-glm-5-2-trained-…] indexed:0 read:3min 2026-06-26 ·