I Benchmarked 4 Lightweight Transformers for Fault Detection. Here's What Survived.

A developer benchmarked four lightweight transformer models—DistilBERT, MobileBERT, TinyBERT-6L, and TinyBERT-4L—against traditional ML baselines for fault detection, finding that TinyBERT-4L achieved 87.8% F1 with 55 MB size and 18 ms CPU latency, nearly matching XGBoost's 87.9% F1 at 0.5 MB. MobileBERT, designed for mobile deployment, scored 0% F1 on every dataset by predicting only the majority class. The most promising result came from combining models, with all code and results published on GitHub.

Everyone talks about deploying ML on edge devices. Very few people show what happens when you actually try. I ran a full benchmark of four lightweight transformer models - DistilBERT, MobileBERT, TinyBERT-6L, and TinyBERT-4L — against traditional ML baselines on three real-world fault detection datasets. All experiments ran on a T4 GPU with consistent hyperparameters. | Model | F1 | Size | CPU Latency | |---|---|---|---| | XGBoost | 87.9% | 0.5 MB | 0.002 ms | | TinyBERT-4L | 87.8% | 55 MB | 18 ms | | DistilBERT | 87.6% | 255 MB | 138 ms | MobileBERT — specifically designed for mobile deployment — scored 0% F1 on every dataset . It predicted the majority class for every sample across all configurations. “Designed for mobile” does not mean “works for your use case.” The most promising result came from combining models: All code and results: https://github.com/disha8611/edge-fault-detection-benchmark https://github.com/disha8611/edge-fault-detection-benchmark Previous research on LLM-based anomaly detection: https://arxiv.org/abs/2604.12218 https://arxiv.org/abs/2604.12218 Disha Patel — Software Engineer & ML Researcher. I write about engineering, on-device ML, and building systems that work in the real world.