{"slug": "diffusion-based-ukrainian-handwritten-text-generation-with-cross-domain-style", "title": "Diffusion-Based Ukrainian Handwritten Text Generation with Cross-Domain Style Transfer", "summary": "Researchers at the University of Cambridge have developed the first large-scale Ukrainian handwritten text generation system, creating a dataset of 126,177 images from 308 writers and retraining a diffusion-based model to produce legible, style-consistent Cyrillic text. The system successfully demonstrates cross-domain style transfer from Latin to Cyrillic scripts, including zero-shot generation of early 20th-century Ukrainian manuscripts and few-shot imitation of contemporary writers. The released dataset and models provide a reproducible benchmark for extending writer-aware handwritten text generation to other underrepresented writing systems.", "body_md": "arXiv:2605.27487v1 Announce Type: new\nAbstract: Handwritten text generation (HTG) conditioned on writer style has been widely studied for Latin scripts, but remains underexplored for low-resource and non-Latin writing systems, leaving open how well existing models generalise beyond the Latin domain. Cyrillic, particularly Ukrainian, lacks both large-scale writer-labeled datasets and empirical evidence of such generalisation. To address this gap, we construct a Ukrainian handwritten word dataset of 126,177 images from 308 writers using connected-component segmentation, quality filtering, and targeted oversampling of underrepresented Ukrainian characters. We retrain DiffusionPen, a MobileNetV2 triplet-loss style encoder with a CANINE-conditioned latent diffusion U-Net, on this dataset without architectural modification, testing direct transfer from Latin to Cyrillic. We evaluate cross-domain style transfer in three settings: cross-lingual transfer from IAM English samples, zero-shot transfer to an early 20th-century Ukrainian manuscript, and few-shot imitation of contemporary writers. The model produces legible, style-consistent word images, indicating that few-shot latent diffusion models generalize beyond the Latin-script domain. We release the dataset, trained models, and evaluation protocol as a reproducible benchmark for writer-aware Cyrillic HTG, providing a foundation for extending stylized HTG to other underrepresented writing systems.", "url": "https://wpnews.pro/news/diffusion-based-ukrainian-handwritten-text-generation-with-cross-domain-style", "canonical_source": "https://arxiv.org/abs/2605.27487", "published_at": "2026-05-28 04:00:00+00:00", "updated_at": "2026-05-28 04:27:09.570241+00:00", "lang": "en", "topics": ["generative-ai", "computer-vision", "machine-learning", "neural-networks", "ai-research"], "entities": ["DiffusionPen", "MobileNetV2", "CANINE", "IAM", "Cyrillic", "Ukrainian"], "alternates": {"html": "https://wpnews.pro/news/diffusion-based-ukrainian-handwritten-text-generation-with-cross-domain-style", "markdown": "https://wpnews.pro/news/diffusion-based-ukrainian-handwritten-text-generation-with-cross-domain-style.md", "text": "https://wpnews.pro/news/diffusion-based-ukrainian-handwritten-text-generation-with-cross-domain-style.txt", "jsonld": "https://wpnews.pro/news/diffusion-based-ukrainian-handwritten-text-generation-with-cross-domain-style.jsonld"}}