The 3.7 kB Alarm: A Zero-Bloat Edge AI Smoke Detector in Pure C

A developer built a 3.7 kB edge AI smoke detector in pure C, using a neural network that runs on ultra-low-power microcontrollers like the ESP32. The model monitors 12 environmental variables in parallel to detect non-linear chemical signatures of imminent fires, requiring no TensorFlow Lite or other heavy frameworks. The project achieved a zero-bloat design by quantizing weights to 8-bit integers, fitting the entire core model in just 96 bytes.

You are building a critical IoT safety device — a smart smoke and fire detector. It monitors 12 environmental variables in parallel: temperature, humidity, TVOC, eCO2, raw hydrogen, ethanol, barometric pressure, and five particulate matter metrics. You want a neural network to catch the non-linear chemical signatures of an imminent fire before the flames start. Then you look at the industry standard for Edge AI. TensorFlow Lite for Microcontrollers demands megabytes of Flash, custom memory allocators, a dynamic runtime, and a dependency chain long enough to make you reconsider the whole project. On a cheap, ultra-low-power microcontroller — an ESP32, an ATtiny — those frameworks eat the silicon and leave nothing for the WiFi stack or peripheral control. The hardware isn't the problem. No synthetic data here. We used the Smoke Detection Dataset — originally collected in the field at 1 Hz by German researcher Stefan Blattmann for his project Real-time Smoke Detection with AI-based Sensor Fusion , and later curated and published on Kaggle by Dataset Grandmaster Deep Contractor. Blattmann's setup captured real environmental readings under real conditions: normal rooms, controlled wood-burning tests, outdoor air, active indoor smoke formation. His goal was the same as ours: prove that sensor fusion is more reliable for saving lives than a single optical or ionization detector that trips on kitchen smoke. The dataset provides 12 environmental features: A note on hardware: 12 features does not mean 12 pins. In a real deployment, Temperature + Humidity come from a single DHT22 or SHT31 1 pin . PM1.0 through NC2.5 come from one particle sensor over UART. eCO₂ + TVOC from a single SGP30 over I2C. The full sensor stack runs on 4–6 physical pins on an ESP32, sharing I2C and UART buses. Raw data is a problem for a micro-network. The original CSV also contains a UTC timestamp, a row index, and a CNT column — a sequential counter. We dropped all three. Timestamps and sequence counters have zero correlation with real chemistry; they cause networks to memorize ordering rather than learn physics. No Jupyter notebooks. No Pandas. A fast, standalone parser in pure C++. Three things it does: The Cleanup. UTC, row index, and CNT are dropped. They are position artifacts that cause instant overfitting. Min-Max Scaling. All 12 sensor readings mapped to a strict 0.0, 1.0 boundary. This prevents high-magnitude features like eCO₂ from drowning out small signals like temperature fluctuations. Target Lock. A critical bug surfaced during early testing: the binary classification column 0 = clean air, 1 = alarm was accidentally swept into the normalization pool and corrupted into floating-point fractions. We locked the label column to ensure it stayed as hard integers. Class Balance. The dataset is skewed — 44,757 fire events vs. 17,873 clean air samples. Without correction, the network takes the easy path: predict "Alarm" constantly and collect 71% accuracy for free. We undersampled the majority class to 17,873, producing a balanced 35,746-row dataset. The output: a clean, normalized CSV with exactly 12 inputs and 1 binary label. We fed 12 features into a funnel architecture: 12 → 8 → 4 → 1 Input Layer 12 Neurons Normalized Sensor Features | Hidden Layer 1 8 Neurons ReLU | Hidden Layer 2 4 Neurons ReLU | Output Layer 1 Neuron Sigmoid → Probability 0.0 to 1.0 By quantizing weights from 32-bit floats down to 8-bit signed integers int8 t , the core weights fit in exactly 96 bytes. First training run looked great on paper. Wrong. During validation, the model output the exact same value for every input. Because the raw dataset was skewed 3:1 toward fire events, the network found the easy path: freeze the biases, say "Alarm" constantly, collect the accuracy applause from the loss function. The fix: aggressive data shuffle, class balancing in the preprocessor, and a lower learning rate to keep weights out of the dead zones of the Sigmoid activation. A subtler problem took longer to catch. The CNT column — a sequential row counter — was included as a feature in early runs. The network learned to use position in the dataset as a proxy for class membership. The results looked excellent. They were meaningless. CNT was dropped and the entire preprocessing pipeline was rebuilt from scratch. Training was done with Hasaki 刃先 v3.2.0 — a CLI C++ tool for desktop neural network training that exports standalone C headers for firmware deployment. No Python, no runtime, no dependencies. Configuration: The model converged at epoch 2,013 . Final validation loss: 0.000008 . After training, we evaluated the model against a held-out test set of 7,150 samples — data the model never touched during training or early stopping. Confusion Matrix: 3547 4 1 3598 Accuracy: 99.9301% Breaking this down: That last number. One missed fire event out of 3,599. 99.97% sensitivity. In a smoke detector, a false positive wakes someone up at 2 AM. A false negative lets a fire go undetected. This network achieved 99.97% sensitivity on data it had never seen, running in 3.7 kB of Flash. The end product is a single, self-contained C header: alarm model.h . The entire file — weights, biases, quantization scales, activation functions, and inference code — is 3.7 kB on disk. Generated by Hasaki v3.2.0: ifndef ALARM MODEL H define ALARM MODEL H include <math.h include <stdint.h // Generated by hasaki v3.2.0 // Architecture: 12-8-4-1 | Quantization: int8 static inline float relu float x { return x 0.0f ? x : 0.0f; } static inline float sigmoid float x { return 1.0f / 1.0f + expf -x ; } // 96 Quantized Weights mapping the 12 sensors const int8 t w1 8 12 = { ... }; const float b1 8 = { ... }; const float w1 scales 8 = { ... }; // ... Layer 2 and 3 definitions ... static inline void predict const float input, float output { float a 8 , b 8 ; // Layer 1: INT8 Symmetric Quantized Dot Product + Scale + ReLU for int i = 0; i < 8; i++ { float dot = 0.0f; for int j = 0; j < 12; j++ dot += float w1 i j input j ; a i = relu b1 i + dot w1 scales i ; } // Layer 2 for int i = 0; i < 4; i++ { float dot = 0.0f; for int j = 0; j < 8; j++ dot += float w2 i j a j ; b i = relu b2 i + dot w2 scales i ; } // Layer 3: Sigmoid output float dot = 0.0f; for int j = 0; j < 4; j++ dot += float w3 0 j b j ; output 0 = sigmoid b3 0 + dot w3 scales 0 ; } endif When compiled, the model adds roughly 2.3 kB to the microcontroller's Flash. Most of that is linking expf from libm for the Sigmoid. Grow the network — you don't pay this cost again. No heap allocations. No malloc . No dynamic object trees. Flat array indexing that runs in microseconds. The header drops directly into any C/C++ firmware. The predict function writes its result into an output array — one float per output neuron. include "alarm model.h" define ALARM PIN 3 define TRIGGER THRESHOLD 3 // Normalization ranges from training data const float INPUT MIN 12 = {-22.01f, 10.74f, 0.0f, 400.0f, 10668.0f, 15317.0f, 930.852f, 0.0f, 0.0f, 0.0f, 0.0f, 0.0f}; const float INPUT MAX 12 = { 59.93f, 75.2f, 60000.0f, 60000.0f, 13803.0f, 21410.0f, 939.861f, 14333.69f, 45432.26f, 61482.03f, 51914.68f, 30026.438f}; void loop { // 1. Read sensors float input 12 = { read temp , read hum , read tvoc , read eco2 , read rawh2 , read ethanol , read pressure , read pm1 , read pm25 , read nc05 , read nc1 , read nc25 }; // 2. Normalize to 0.0, 1.0 for int i = 0; i < 12; i++ input i = input i - INPUT MIN i / INPUT MAX i - INPUT MIN i ; // 3. Inference float output 1 ; predict input, output ; // 4. State accumulator — 3 consecutive positives required to trigger static int accumulator = 0; if output 0 = 0.5f { if accumulator < TRIGGER THRESHOLD accumulator++; } else { if accumulator 0 accumulator--; } // 5. Drive alarm pin digitalWrite ALARM PIN, accumulator = TRIGGER THRESHOLD ? HIGH : LOW ; delay 1000 ; // 1 Hz — matches dataset sample rate } The state accumulator requires 3 consecutive predictions above 0.5 before triggering the alarm, and decrements on clean readings. A real fire doesn't disappear in 3 seconds. An isolated sensor spike does. With 99.97% sensitivity, 3 consecutive false negatives is essentially impossible. When something runs slow or takes too much space, the reflex is to upgrade the chip, buy more cloud credits, or add RAM. We wrote our own parser, caught two data bugs before they made it into print, and ended up with a 3.7 kB inference engine running on hardware that costs less than a cup of coffee. The sensor fusion approach Stefan Blattmann set out to validate — that multi-variable chemical signatures outperform single-sensor smoke detectors — holds up under scrutiny. The silicon was never the constraint. Nasaki 刃先 companion repository available at github.com/AlexRosito67/hasaki-smoke-detector