# The distillation attack no API can fully block

> Source: <https://www.devclubhouse.com/a/the-distillation-attack-no-api-can-fully-block>
> Published: 2026-06-25 04:03:52+00:00

[AI](https://www.devclubhouse.com/c/ai)Article

# The distillation attack no API can fully block

Anthropic's 28.8-million-query accusation against Alibaba exposes an uncomfortable truth: when you sell model outputs, you sell the training data too.

[Mariana Souza](https://www.devclubhouse.com/u/mariana_souza)

[Anthropic](https://www.anthropic.com) has accused Alibaba of running what it calls the largest known distillation attack against Claude to date. According to a June 10 letter sent to the U.S. Senate Banking Committee and confirmed by CNBC, Reuters and the AFR, operators tied to Alibaba and its Qwen AI lab made roughly 28.8 million exchanges with Claude across about 25,000 fraudulent accounts between April 22 and June 5. The stated target, per the letter, was Claude's most valuable behavior: software engineering and agentic reasoning, the same capabilities marketed under Anthropic's Mythos Preview.

Strip away the geopolitics and the national-security framing for a second, because the interesting part for engineers is the attack itself. Distillation through a public API is one of those problems where the defender is structurally disadvantaged. The thing you're trying to protect is the exact thing you ship to every paying customer. You can't have it both ways.

## What "distillation attack" actually means here

Knowledge distillation is old and boring in the best sense. The original idea (Hinton and colleagues, 2015) was to train a small "student" model to mimic a larger "teacher," originally by matching the teacher's soft probability distribution over outputs. For modern LLMs you don't even need the logits. Black-box distillation works on plain text: you generate a giant corpus of high-quality `(prompt, response)`

pairs from a strong model, then supervised-fine-tune your own model on them. The student inherits the teacher's style, reasoning patterns, and a lot of its task competence, for the cost of inference instead of the cost of frontier-scale training.

This isn't exotic. A big chunk of the open-model ecosystem was bootstrapped exactly this way. Stanford's Alpaca and the early Vicuna models were fine-tuned on outputs generated from OpenAI models. It's cheap, it's effective, and it's why every frontier lab's terms of service explicitly prohibit using their outputs to train a competing model. The technique is legitimate research and a ToS violation at the same time, depending entirely on who's doing it and to what.

What makes Anthropic's claim notable is the industrialization. 25,000 fake accounts isn't a researcher poking at an endpoint. That's an account-farming operation built specifically to spread query volume thin enough to slip under per-account rate limits and abuse heuristics. The fraudulent accounts are the actual exploit. The distillation is just what the harvested data gets used for.

## The escalation is the story

This is not the first time Anthropic has named names. Back in February it described three other industrial-scale campaigns it attributed to Chinese labs, and the reported volumes tell a clear story when you line them up against the Alibaba figure.

```
xychart-beta
    title "Reported exchanges per alleged distillation campaign"
    x-axis [DeepSeek, Moonshot, MiniMax, Alibaba]
    y-axis "Millions of exchanges" 0 --> 30
    bar [0.15, 3.4, 13, 28.8]
```

DeepSeek's reported operation was over 150,000 exchanges. [Moonshot](https://www.deepseek.com) was cited at over 3.4 million, MiniMax over 13 million, and now Alibaba at 28.8 million. Whatever you make of the attribution, the trajectory is the point. Each campaign is roughly an order of magnitude past the last, and Anthropic itself described the earlier wave as growing in "intensity and sophistication." If these numbers are even directionally right, defenders are losing the volume race.

There's a reason for that asymmetry. Generating a few hundred thousand fake accounts is a solved problem for any organization with budget and access to phone-number pools, residential proxies, and stolen or synthetic identities. Detecting them without nuking your legitimate sign-up funnel is genuinely hard. The attacker only needs the heuristics to fail occasionally, at scale. The defender needs them to succeed almost always.

## Why prevention isn't really on the table

Here's the blunt version: you cannot technically prevent black-box distillation while running a public inference API. If a request gets a useful answer, that answer can be logged and used as a training label. Full stop. Everything a provider can actually do falls into detection and friction, not prevention.

The realistic toolkit looks like this:

**Account-level fraud detection.** Phone/KYC verification, device fingerprinting, payment-instrument reputation, and graph analysis to spot account clusters that share infrastructure. This is the front line, and it's where the 25,000-account swarm was supposed to win.**Behavioral fingerprinting of usage.** Distillation has a shape. It tends to sweep prompt space for coverage rather than solve one user's recurring problem, skews heavily toward programmatic high-throughput access, and spans an unusually broad topic distribution per account. That's detectable in aggregate even when each individual account looks plausible.**Output watermarking.** Statistical watermarks embedded in generated text can later show up in a competitor's model outputs, which is more useful as forensic evidence than as a blocker. It proves provenance after the fact.**Rate and capability gating.** Tighter limits on the highest-value endpoints, the agentic and coding paths, which is precisely what the letter says was targeted.

Notice what Anthropic's actual response was. Not a clever cryptographic fix. A letter to Congress, threat-intelligence sharing with other labs, and an appeal to policy. That tells you where the company thinks the real leverage is, and it's an implicit admission that the technical defense is porous.

## What this means if you build on these APIs

If you're a legitimate developer shipping on Claude, GPT, or Gemini, you're going to feel this as collateral friction, and it's worth planning for now.

Expect stricter onboarding. More KYC, more phone verification, more scrutiny on new accounts and fresh API keys, especially for high-volume or agentic workloads. A startup spinning up thousands of programmatic sub-accounts for multi-tenant inference is going to look statistically similar to an account farm, and you may get caught in the same net. If your architecture fans out across many keys to dodge rate limits, reconsider, because that pattern is now a red flag.

Re-read the terms of service, specifically the clause about training competing models on outputs. The line between "generating synthetic data to fine-tune my internal classifier" and "distilling a competitor" is legally fuzzy and getting more politically charged. Teams that have quietly been fine-tuning small open models on GPT-4 or Claude transcripts, a genuinely common practice, are now operating in a space where the providers are actively hunting for exactly that signature and lobbying governments about it.

And if you're on the open-model side, weighing whether to bootstrap a model on a frontier API's outputs, understand the risk has shifted. It used to be a ToS gray area. It's now an export-control and national-security talking point, with the Commerce Department already imposing restrictions on Anthropic's own Mythos and Fable models around the same window. The legal exposure of "distill the expensive model" is climbing fast.

## The take

The attribution will get argued in Washington, and the numbers come from a single company's letter, so treat the specifics as Anthropic's account rather than settled fact. But the underlying engineering reality doesn't depend on who's right about Alibaba. Frontier capability is far easier to copy through an API than to build from scratch, and selling access to a model means selling a machine that emits perfect training data on demand. No provider has a clean technical answer to that, which is why they're all reaching for fraud detection, watermarks, and lawmakers instead of math. For working developers the practical fallout is more verification, more usage monitoring, and a narrowing of what you can quietly do with model outputs. The moat around a frontier model was always going to be thinner than the marketing suggested. This is what it looks like when everyone admits it.

## Sources & further reading

-
[Anthropic says Alibaba illicitly extracted Claude AI model capabilities](https://www.reuters.com/world/china/anthropic-says-alibaba-illicitly-extracted-claude-ai-model-capabilities-2026-06-24/)— reuters.com -
[Anthropic says Alibaba illicitly extracted Claude AI model capabilities - AOL](https://www.aol.com/articles/anthropic-says-alibaba-illicitly-extracted-203048000.html)— aol.com -
[Anthropic accuses Alibaba of campaign to 'brazenly' and 'illicitly' extract AI capabilities](https://www.cnbc.com/2026/06/24/anthropic-alibaba-distillation-campaign.html)— cnbc.com -
[Anthropic Accuses Alibaba of Distilling Claude AI Model Capabilities](https://www.globalbankingandfinance.com/anthropic-alibaba-illicitly-extracted-claude-ai-model/)— globalbankingandfinance.com -
[Anthropic accuses Alibaba of illicitly accessing Claude](https://www.afr.com/world/north-america/anthropic-accuses-china-s-ai-labs-of-industrial-scale-theft-20260625-p609wt)— afr.com

[Mariana Souza](https://www.devclubhouse.com/u/mariana_souza)· Senior Editor

Mariana covers the fast-moving world of machine learning and generative AI, with a particular focus on how these technologies are reshaping development workflows. When she isn't stress-testing the latest foundation models, she's usually at a local hackathon.

## Discussion 0

No comments yet

Be the first to weigh in.