{"slug": "architectural-bias-in-face-presentation-attack-detection-a-comparative-study-of", "title": "Architectural Bias in Face Presentation Attack Detection: A Comparative Study of Vision Transformers and Convolutional Neural Networks", "summary": "A new study comparing Vision Transformers and CNNs for face presentation attack detection found that pretrained Vision Transformers reduce demographic bias and improve accuracy. The DeiT-S model achieved 97.27% accuracy and reduced the inter-ethnic accuracy gap by 83% compared to prior work, while also generalizing 3.6x better to unseen demographic groups. The findings suggest architectural design influences fairness in biometric systems.", "body_md": "arXiv:2606.18510v1 Announce Type: new\nAbstract: Face Presentation Attack Detection (PAD) systems constitute a critical security layer in biometric authentication; however, existing approaches exhibit systematic performance disparities across demographic groups, disproportionately affecting individuals with darker skin tones. This paper presents a comparative empirical investigation of whether Vision Transformer architectures reduce demographic bias in face PAD systems relative to convolutional baselines. Experiments are conducted on the CASIA-SURF Cross-Ethnicity Face Anti-Spoofing (CeFA) dataset. Three architectures are evaluated: a Multimodal ViT-Tiny trained from scratch, a ResNet18 CNN baseline, and a pretrained DeiT-S fine-tuned on CeFA across African, East Asian, and zero-shot Central Asian demographic groups. DeiT-S achieves the highest overall accuracy of 97.27% and the lowest EER of 0.86%, outperforming ResNet18 at 90.15% accuracy. In terms of fairness, DeiT-S reduces the inter-ethnic ACER gap between African and East Asian subjects to 0.13%, compared to 0.75% reported in an LBP-based work [6], representing an 83% reduction. Most notably, while ResNet18 records a BPCER of 10.44% on zero-shot Central Asian subjects, DeiT-S maintains 2.89% on the same unseen group, demonstrating a 3.6x generalization advantage. These results suggest that pretrained Vision Transformers achieve superior PAD accuracy, produce smaller demographic performance gaps, and generalize more equitably across unseen demographic groups, indicating that cross-demographic fairness in PAD may partly be influenced by architectural design.", "url": "https://wpnews.pro/news/architectural-bias-in-face-presentation-attack-detection-a-comparative-study-of", "canonical_source": "https://arxiv.org/abs/2606.18510", "published_at": "2026-06-18 04:00:00+00:00", "updated_at": "2026-06-19 02:01:03.644069+00:00", "lang": "en", "topics": ["computer-vision", "machine-learning", "ai-ethics", "neural-networks", "ai-research"], "entities": ["Vision Transformer", "ResNet18", "DeiT-S", "CASIA-SURF CeFA", "ViT-Tiny"], "alternates": {"html": "https://wpnews.pro/news/architectural-bias-in-face-presentation-attack-detection-a-comparative-study-of", "markdown": "https://wpnews.pro/news/architectural-bias-in-face-presentation-attack-detection-a-comparative-study-of.md", "text": "https://wpnews.pro/news/architectural-bias-in-face-presentation-attack-detection-a-comparative-study-of.txt", "jsonld": "https://wpnews.pro/news/architectural-bias-in-face-presentation-attack-detection-a-comparative-study-of.jsonld"}}