# I Benchmarked 4 Lightweight Transformers for Fault Detection. Here's What Survived.

> Source: <https://dev.to/dishapatel8/i-benchmarked-4-lightweight-transformers-for-fault-detection-heres-what-survived-n0g>
> Published: 2026-05-31 03:36:34+00:00

Everyone talks about deploying ML on edge devices. Very few people show what happens when you actually try.

I ran a full benchmark of four lightweight transformer models - **DistilBERT, MobileBERT, TinyBERT-6L, and TinyBERT-4L** — against traditional ML baselines on three real-world fault detection datasets.

All experiments ran on a T4 GPU with consistent hyperparameters.

| Model | F1 | Size | CPU Latency |
|---|---|---|---|
| XGBoost | 87.9% |
0.5 MB |
0.002 ms |
| TinyBERT-4L | 87.8% | 55 MB | 18 ms |
| DistilBERT | 87.6% | 255 MB | 138 ms |

MobileBERT — specifically designed for mobile deployment — scored **0% F1 on every dataset**. It predicted the majority class for every sample across all configurations.

“Designed for mobile” does not mean “works for your use case.”

The most promising result came from combining models:

All code and results:

[https://github.com/disha8611/edge-fault-detection-benchmark](https://github.com/disha8611/edge-fault-detection-benchmark)

Previous research on LLM-based anomaly detection:

[https://arxiv.org/abs/2604.12218](https://arxiv.org/abs/2604.12218)

*Disha Patel — Software Engineer & ML Researcher. I write about engineering, on-device ML, and building systems that work in the real world.*
