Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening

wpnews.pro

cd /news/computer-vision/benchmarking-convolutional-transform… · home › topics › computer-vision › article

[ARTICLE · art-14870] src=arxiv.org ↗ pub=2026-05-27T04:00Z topic=computer-vision verified=true sentiment=· neutral

Benchmarking Convolutional, Transformer, Hybrid, and Vision Language Models for Multi Disease Retinal Screening

A new study benchmarked twelve deep learning architectures across four model families—convolutional neural networks, vision transformers, hybrid CNN-transformer backbones, and vision-language models—for multi-disease retinal screening using the RFMiD dataset. Attention-based models, including SwinTiny, CoAtNet0, and MaxViTTiny, achieved the highest performance on binary screening and multi-label classification across 28 disease classes, while vision-language models matched but did not surpass top transformer and hybrid backbones. The findings provide a reproducible reference for model selection in automated retinal screening, with external validation on Messidor-2 showing AUC ranging from 66.8% to 84.7% for referable diabetic retinopathy.

read1 min views14 publishedMay 27, 2026

arXiv:2605.26283v1 Announce Type: new Abstract: Modern deep learning offers powerful tools for automated retinal screening, but it remains unclear how different visual model families compare in realistic multi-disease settings and under domain shift. In this work, we benchmark twelve architectures across four model families: convolutional neural networks, vision transformers, hybrid CNN-transformer backbones, and vision-language models, using the Retinal Fundus Multi-disease Image Dataset (RFMiD). We evaluate two tasks: binary screening for any retinal disease and multi-label classification across 28 disease classes. Using standardized training, calibration, and evaluation protocols, we report AUC, F1, precision, recall, and sensitivity at a clinically relevant operating point with specificity near 80%. On RFMiD, all architectures perform well on binary screening, with AUC above 84%, but attention-based models perform best. SwinTiny and the hybrid CoAtNet0 and MaxViTTiny models achieve the strongest binary screening results and improve macro and micro F1 in the multi-label setting. Vision-language models, including CLIP ViT-B/16 and SigLIP-Base384, are competitive with CNN baselines but do not surpass the best transformer and hybrid backbones. In external validation on Messidor-2 for referable diabetic retinopathy, AUC ranges from 66.8% to 84.7%, with hybrid and transformer models again showing strong performance. These results provide a reproducible reference for model selection in multi-disease retinal screening and guide future automated screening tools for clinical deployment.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/benchmarking-convolution…

Read original on arxiv.org → arxiv.org/abs/2605.26283

mentioned entities

RFMiD

Messidor-2

SwinTiny

CoAtNet0

MaxViTTiny

CLIP ViT-B/16

SigLIP-Base384

metadata

slugbenchmarking-convolutional-transformer-hybrid-and-vision-language-models-for

topic#computer-vision

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevSejong University launches Asia’…

next →European AI adoption hits 99% wi…

── more in #computer-vision 4 stories · sorted by recency

machinebrief.com · 14 Jul · #computer-vision

GNNs: Decoupling Feature Transformation from Topology

machinebrief.com · 14 Jul · #computer-vision

Human Pose Modeling with Neural Priors

machinebrief.com · 14 Jul · #computer-vision

MR Elastography with Deep Learning

machinebrief.com · 14 Jul · #computer-vision

Vision Transformers: GradSkip Sets a New Benchmark

── more on @rfmid 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required