Representation-Conditioned Diffusion Models for Guided Training Data Generation

wpnews.pro

cd /news/generative-ai/representation-conditioned-diffusion… · home › topics › generative-ai › article

[ARTICLE · art-16013] src=arxiv.org ↗ pub=2026-05-28T04:00Z topic=generative-ai verified=true sentiment=↑ positive

Representation-Conditioned Diffusion Models for Guided Training Data Generation

Researchers have developed representation-conditioned diffusion models that generate synthetic training images by conditioning on learned representations from DINOv2, DINOv3, and CLIP, achieving a 10.76 percentage point improvement in top-1 accuracy over class-conditioned generation on ImageNet100. Scaling the synthetic dataset allowed a classifier to outperform one trained on real data by 2.0 percentage points. The findings suggest that this approach could augment, complement, or replace real-world datasets in large-scale visual learning tasks.

read1 min views12 publishedMay 28, 2026

arXiv:2605.27495v1 Announce Type: new Abstract: Data availability remains a critical bottleneck in many deep learning applications. Large-scale datasets are often expensive to collect, curate and annotate, which can limit the scalability and applicability of supervised learning methods. In this work, we evaluate the classification performance of models trained on synthetic image datasets produced by generative deep learning. In particular, we use latent diffusion models conditioned on learned representations from DINOv2, DINOv3, and CLIP. Our results demonstrates that this representation-conditioned formulation significantly outperforms class-conditioned generation by a large margin (+10.76 p.p. top-1 accuracy on ImageNet100), by improving sample quality and mode coverage. Furthermore, by scaling the size of the synthetic dataset, we are able to outperform a classifier trained on the real data (+2.0 p.p top-1 accuracy). We also demonstrate how generated images can be used for augmentation purposes, outperforming classical augmentation methods, and how the conditioning space can be used for sample filtering to further improve training value. Collectively, these findings highlight that representation-conditioned diffusion models provide a promising approach for augmenting, complementing, or potentially replacing real-world datasets in large-scale visual learning tasks.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/representation-condition…

Read original on arxiv.org → arxiv.org/abs/2605.27495

mentioned entities

DINOv2

DINOv3

CLIP

ImageNet100

metadata

slugrepresentation-conditioned-diffusion-models-for-guided-training-data-generation

topic#generative-ai

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevOpen House 2026 Day 1: real-time…

next →New poll points to possible Bece…

── more in #generative-ai 4 stories · sorted by recency

machinebrief.com · 14 Jul · #generative-ai

Mammography Models: Are They Really Up to the Task?

arxiv.org · 7 Jul · #generative-ai

How many labels do you need? A decision framework for cross-habitat marine species recognition

machinebrief.com · 14 Jul · #generative-ai

GenGNN: Challenging Graph Generation Assumptions

arxiv.org · 14 Jul · #generative-ai

Towards Objective Dysgraphia Detection: A Multi-Branch Deep Learning Approach for Online Handwriting Analysis

── more on @dinov2 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required