cd /news/computer-vision/cnns-transformers-hybrid-and-vision-… · home topics computer-vision article
[ARTICLE · art-14872] src=arxiv.org pub= topic=computer-vision verified=true sentiment=· neutral

CNNs, Transformers, Hybrid, and Vision Language Models for Skin Cancer Detection

A unified evaluation of twelve deep learning models for binary skin cancer detection on the PAD-UFES-20 dataset found that well-tuned CNNs provide strong baselines, but transformer-based families consistently improve discrimination. Hybrid models (MaxViT Tiny, CoAtNet0) and a SigLIP-based vision language model achieved the best overall trade-off between ranking performance and clinically relevant operating points, while a CLIP-based model offered high precision. The findings offer practical guidance on model family suitability for real-world skin cancer screening deployment and establish a reproducible reference point for future work on the dataset.

read1 min publishedMay 27, 2026

arXiv:2605.26294v1 Announce Type: new Abstract: Skin cancer is a common and fast rising malignancy worldwide. Early detection is critical for improving outcomes. Deep learning models trained on dermoscopic and clinical images can support automated and fast triage. However, many studies evaluate only a limited set of architectures. Experimental setups also vary across studies. In this paper, we present a unified evaluation of twelve deep learning models for binary skin cancer detection on the PAD-UFES-20 dataset. The models span four families: convolutional neural networks (CNN), vision transformers (ViT), hybrid convolution transformer backbones, and vision language models (VLM). Performance is assessed using AUC, the maximum F1 score with its precision and recall, and sensitivity at 80% specificity, reflecting screening oriented requirements. Our results show that well tuned CNNs already provide strong baselines, but transformer based families consistently improve discrimination. Hybrid models (MaxViT Tiny, CoAtNet0) and a SigLIP based VLM achieve the best overall trade off between ranking performance and clinically relevant operating points, while CLIP based model offers high precision. The full codebase for all experiments is publicly released. Together, these findings offer practical guidance on which model families are most suitable for real world deployment in skin cancer screening and establish a reproducible reference point for future work on PAD-UFES-20.

── more in #computer-vision 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/cnns-transformers-hy…] indexed:0 read:1min 2026-05-27 ·