# Train Your Own AI Image Detector: Why Off-the-Shelf Detectors Fail on Your Data (DINOv2 + ConvNeXt…

> Source: <https://pub.towardsai.net/train-your-own-ai-image-detector-why-off-the-shelf-detectors-fail-on-your-data-dinov2-convnext-f5d1b992a71e?source=rss----98111c9905da---4>
> Published: 2026-06-19 14:31:01+00:00

Companion toHow I Distilled a Gemini Vision Model into a 4.6M-Parameter Model. Last time, the lesson was: youdon’tneed to fine-tune a backbone — freeze a big model, train a tiny head, ship it. This time I needed the backbone. The reason is the whole point.

A fashion-discovery feed is only as good as its images. And a growing share of what people upload now isn’t a photo of a real outfit — it’s an AI render. Some of it is stunning. Most of it is *slop*: plastic skin, six-fingered hands, a dress that ignores gravity. Left alone, it rots the feed.

The job: **detect AI-generated images so we can filter them.**

My first instinct was the lazy one — yours probably is too: *someone has already built this AI image detector.* And they have, dozens of times. Hugging Face is full of “AI vs. human” detectors you can download and self-host this afternoon, and the brochures are glorious. One [recent detector](https://huggingface.co/Bombek1/ai-image-detector-siglip-dinov2) reports a **0.9997 AUC** on its validation set. Near-perfect.

So I downloaded a popular one and measured how well it agreed with the AI-probability labels we already trust in production. That measurement is the first snippet — and it’s the one that set the whole project in motion:

``` python
from transformers import pipelinefrom sklearn.metrics import roc_auc_score# a popular off-the-shelf detector - downloaded and run locallyclf = pipeline("image-classification", model="Organika/sdxl-detector", device="mps")def p_ai(path):                       # the detector's probability that an image is AI    out = {d["label"]: d["score"] for d in clf(path)}    return out["artificial"]preds  = [p_ai(p) for p in sample_paths]y_true = [s >= 0.5 for s in teacher_ai_score]     # our production labelsprint(roc_auc_score(y_true, preds))               # 0.68 - on *our* images
```

It scored **0.68.** A throwaway linear model I’d wire up in the next section scored **0.82.**

Three numbers — 0.9997, 0.68, 0.82. The first is what the shelf *advertises*. The second is what a real, published detector did on *our* data. The third is a ten-minute warm-up beating it. The whole article lives in that gap, and it isn’t a story about one bad model.

This is a structural limit the research community keeps shouting about, and almost nobody building products hears.

AI-generated image detection is *not solved.* At ICLR 2025, a paper bluntly titled [ A Sanity Check for AI-Generated Image Detection](https://github.com/shilinyan99/AIDE) built a deliberately hard benchmark and found that

There are two cliffs.

**Cross-generator.** Train a detector on GAN images and it falls apart on diffusion images — each generator family leaves a different fingerprint. You’re always detecting *yesterday’s* fakes.

**Cross-objective — the one that got me.** My detector wasn’t asked “is this AI?” in the abstract. It was asked to **agree with the specific AI-scores we run in production**, on **our** image distribution — Pinterest-style fashion, not whatever set the downloaded model grew up on. Different target, different data, same word “fake” meaning two different things.

No download fixes that. The signal lives in *your* data, so you have to train on *your* data — which sounds expensive, right up until it isn’t.

Generators build images by **upsampling** — doubling resolution again and again to fill in detail. That leaves faint, regular **fingerprints in the frequency domain** near the high-frequency edges of the image: periodic patterns your eye never registers but a model can ([Frank et al., ICML 2020](https://proceedings.mlr.press/v119/frank20a/frank20a.pdf); Durall et al., CVPR 2020; CNNSpot, CVPR 2020). Detection isn’t about *content* — “is this a plausible dress?” It’s about *texture statistics*: the microscopic signature of how the pixels were synthesized.

Hold that thought. It’s why the cheap trick almost worked — and why it didn’t quite.

In my [last piece](https://medium.com/design-bootcamp/how-i-distilled-a-gemini-vision-model-into-a-4-6m-parameter-a44a6c40e4c2), the move was: don’t train a vision model — freeze a big one, grab its embeddings, train a tiny head. So I tried it here: every image through **DINOv2** (Meta’s self-supervised ViT) → a 768-dim embedding → a dead-simple logistic regression. No fine-tuning yet — just a frozen DINOv2 feature extractor and a linear probe.

``` python
import timm, torchfrom sklearn.linear_model import LogisticRegressionfrom sklearn.preprocessing import StandardScalerdino = timm.create_model("vit_base_patch14_dinov2.lvd142m",                         pretrained=True, num_classes=0).eval()   # frozen - pure inference@torch.no_grad()def embed(batch):                       # batch: (N, 3, 224, 224)    return dino(batch).cpu().numpy()    # (N, 768)# X = DINOv2 embeddings for ~36k images; y = (teacher_ai_score >= 0.5)Xs = StandardScaler().fit_transform(X)probe = LogisticRegression(max_iter=2000, class_weight="balanced").fit(Xs, y)
```

**0.82 ROC-AUC.** From a *linear* model. No fine-tuning, no GPU, a few seconds of training.

And here’s where the lazy hot-take (“frozen embeddings are useless for detection”) is just wrong: that 0.82 isn’t a failure — it’s a strong, free baseline, and the literature predicted it. [UnivFD (CVPR 2023)](https://www.researchgate.net/publication/373323377) showed a linear classifier on frozen CLIP features *out-generalizes CNNs trained from scratch*. And [recent forensics work](https://arxiv.org/abs/2411.19117) found **DINOv2 is especially good** here — its self-supervised features preserve fine-grained texture better than CLIP’s semantic ones, so a linear head on DINOv2 can beat fully supervised models.

The frozen DINOv2 baseline got me most of the way for free. The only question left: is the last mile worth it?

Before fine-tuning anything, I looked at the label distribution. This is the snippet I run on every new dataset, and it saved me an afternoon of chasing the wrong fix:

``` python
import pandas as pdlabels = pd.read_parquet("labels.parquet")          # one row/image: ai_score in [0, 1]print((labels.ai_score < 0.1).mean())               # 0.75  -> 3 in 4 are obviously "real"print((labels.ai_score >= 0.5).mean())              # 0.11  -> only ~11% are "likely AI"
```

Three out of four images sat near zero; only ~11% were “likely AI.” I wrote **“will need class weighting”** in my notes before training a thing. Remember that — it’s the plot twist.

The frozen DINOv2 probe plateaued at 0.82 because a frozen backbone only hands you the signal it *already* encodes. To surface *this teacher’s* artifacts, the model has to see raw pixels and adapt its own low-level filters. So I fine-tuned a small CNN end-to-end — **ConvNeXt-Tiny** (not a random pick: a strong ICLR 2025 detector, [AIDE](https://github.com/shilinyan99/AIDE), is also ConvNeXt-based).

Three choices mattered, and every one is *backwards* from normal image classification. The first one is a custom transform, and it’s the most important line in the whole project:

``` python
import randomfrom PIL import Imageclass NativeCrop:    """Take a 224px square at NATIVE resolution. NEVER downscale - that smears the high-frequency    artifacts we're trying to detect. Only upscale when the image is smaller than the crop."""    def __init__(self, size=224, train=True):        self.size, self.train = size, train    def __call__(self, img: Image.Image) -> Image.Image:        w, h, s = *img.size, self.size        if min(w, h) < s:                                   # too small -> upscale just enough            scale = s / min(w, h)            img = img.resize((round(w * scale), round(h * scale))); w, h = img.size        left = random.randint(0, w - s) if self.train else (w - s) // 2        top  = random.randint(0, h - s) if self.train else (h - s) // 2        return img.crop((left, top, left + s, top + s))
```

The other two choices:

Now the model, the loss, and the training step. The loss is where the plot twist pays off:

``` python
import timm, torch, torch.nn as nnmodel = timm.create_model("convnext_tiny", pretrained=True, num_classes=1).to("mps")# soft-label distillation: BCE against the teacher's 0–1 score.# pos_weight counters the ~11%-positive imbalance we found above.pos_weight = torch.tensor((1 - y.mean()) / y.mean())      # ≈ 6loss_fn = nn.BCEWithLogitsLoss(pos_weight=pos_weight)opt = torch.optim.AdamW(model.parameters(), lr=1e-4, weight_decay=0.05)for x, target in loader:                                  # target = teacher's soft 0–1 score    with torch.autocast("mps", dtype=torch.bfloat16):        loss = loss_fn(model(x.to("mps")).squeeze(1), target.to("mps"))    opt.zero_grad(); loss.backward(); opt.step()
```

Here’s the twist I set up earlier. My first fine-tune used a *plain* loss — no pos_weight — as a baseline. It did exactly what an imbalanced dataset makes a model do: it **plateaued at 0.785.** *Below the free probe.* With a plain loss, the model found the cheapest possible strategy — call everything "real," eat the 11% error, go home. Low loss. Useless detector.

The fix wasn’t a bigger model or more epochs. It was the pos_weight ≈ 6 you already saw — **one line** — so the rare AI images actually pulled on the gradient. That single change broke the plateau and the model climbed to **0.86 validation, 0.84–0.875 on the held-out test set.**

Same model. Same data. One line.

When a model stalls, suspect the loss and the data before you touch the architecture. I keep relearning this, and I keep writing it down so maybe one day I’ll learn it.

A detector that nails the training set and flops in production is worse than useless. So the last snippet is the one that earns trust: the held-out evaluation, reported the way an imbalanced problem demands — **ROC-AUC and PR-AUC**, not accuracy (with ~11% positives, “always predict real” scores 89% accuracy and catches zero fakes).

``` python
from sklearn.metrics import roc_auc_score, average_precision_score@torch.no_grad()def evaluate(model, loader):    p, y = [], []    for x, target in loader:        p += torch.sigmoid(model(x.to("mps")).squeeze(1)).cpu().tolist()        y += target.tolist()    yb = [t >= 0.5 for t in y]    return roc_auc_score(yb, p), average_precision_score(yb, p)# train-sample vs held-out test: gap ≈ 0.035 -> generalizing, not memorizingprint(evaluate(model, test_loader))     # (0.843, ...)
```

The train-vs-test ROC gap came out to ≈ **0.035** — basically nothing. It generalized.

A **ConvNeXt-Tiny** AI-image detector, fine-tuned end-to-end, agreeing with our production AI-scores at **~0.84–0.875 ROC-AUC**. The payoff still feels slightly illegal:

The detector I *couldn’t* download, I trained over an afternoon for the price of some electricity.

Three caveats, said out loud so you don’t have to find them:

When you need a model, the reflex is to go shopping. For AI-generated image detection the shelf is *full* — and most of it won’t fit your data. That’s not bad luck; it’s the documented generalization gap, and it’s the title of this article.

But the reflex hides the real lesson: **training your own AI image detector stopped being the expensive option.** A frozen foundation model hands you a strong baseline for free; a few minutes of fine-tuning on a laptop closes the gap. You don’t need the GPU cluster you were dreading. You need the right loss, native-resolution crops, and the nerve to close the download tab and open a notebook.

Last time I told you that you don’t need to fine-tune a backbone. This time I did — and now you can tell which room you’re standing in.

The detector you can download was trained on someone else’s images. Yours weren’t.

*If you’ve built a model that had to survive contact with real, messy, in-the wild data — what broke? I read every reply.*

[Train Your Own AI Image Detector: Why Off-the-Shelf Detectors Fail on Your Data (DINOv2 + ConvNeXt…](https://pub.towardsai.net/train-your-own-ai-image-detector-why-off-the-shelf-detectors-fail-on-your-data-dinov2-convnext-f5d1b992a71e) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.