# The 3.7 kB Alarm: A Zero-Bloat Edge AI Smoke Detector in Pure C

> Source: <https://dev.to/alexrosito67/the-37-kb-alarm-a-zero-bloat-edge-ai-smoke-detector-in-pure-c-2ii2>
> Published: 2026-05-28 18:56:20+00:00

You are building a critical IoT safety device — a smart smoke and fire detector.

It monitors 12 environmental variables in parallel: temperature, humidity, TVOC, eCO2, raw hydrogen, ethanol, barometric pressure, and five particulate matter metrics.

You want a neural network to catch the non-linear chemical signatures of an imminent fire before the flames start.

Then you look at the industry standard for Edge AI.

TensorFlow Lite for Microcontrollers demands megabytes of Flash, custom memory allocators, a dynamic runtime, and a dependency chain long enough to make you reconsider the whole project. On a cheap, ultra-low-power microcontroller — an ESP32, an ATtiny — those frameworks eat the silicon and leave nothing for the WiFi stack or peripheral control.

The hardware isn't the problem.

No synthetic data here. We used the **Smoke Detection Dataset** — originally collected in the field at 1 Hz by German researcher Stefan Blattmann for his project *Real-time Smoke Detection with AI-based Sensor Fusion*, and later curated and published on Kaggle by Dataset Grandmaster Deep Contractor.

Blattmann's setup captured real environmental readings under real conditions: normal rooms, controlled wood-burning tests, outdoor air, active indoor smoke formation. His goal was the same as ours: prove that sensor fusion is more reliable for saving lives than a single optical or ionization detector that trips on kitchen smoke.

The dataset provides 12 environmental features:

A note on hardware: 12 features does not mean 12 pins. In a real deployment, Temperature + Humidity come from a single DHT22 or SHT31 (1 pin). PM1.0 through NC2.5 come from one particle sensor over UART. eCO₂ + TVOC from a single SGP30 over I2C. The full sensor stack runs on 4–6 physical pins on an ESP32, sharing I2C and UART buses.

Raw data is a problem for a micro-network. The original CSV also contains a UTC timestamp, a row index, and a **CNT column** — a sequential counter. We dropped all three. Timestamps and sequence counters have zero correlation with real chemistry; they cause networks to memorize ordering rather than learn physics.

No Jupyter notebooks. No Pandas. A fast, standalone parser in pure C++.

Three things it does:

**The Cleanup.** UTC, row index, and CNT are dropped. They are position artifacts that cause instant overfitting.

**Min-Max Scaling.** All 12 sensor readings mapped to a strict [0.0, 1.0] boundary. This prevents high-magnitude features like eCO₂ from drowning out small signals like temperature fluctuations.

**Target Lock.** A critical bug surfaced during early testing: the binary classification column (0 = clean air, 1 = alarm) was accidentally swept into the normalization pool and corrupted into floating-point fractions. We locked the label column to ensure it stayed as hard integers.

**Class Balance.** The dataset is skewed — 44,757 fire events vs. 17,873 clean air samples. Without correction, the network takes the easy path: predict "Alarm" constantly and collect 71% accuracy for free. We undersampled the majority class to 17,873, producing a balanced 35,746-row dataset.

The output: a clean, normalized CSV with exactly 12 inputs and 1 binary label.

We fed 12 features into a funnel architecture: **12 → 8 → 4 → 1**

```
[Input Layer]     12 Neurons (Normalized Sensor Features)
                        |
[Hidden Layer 1]   8 Neurons (ReLU)
                        |
[Hidden Layer 2]   4 Neurons (ReLU)
                        |
[Output Layer]     1 Neuron  (Sigmoid → Probability 0.0 to 1.0)
```

By quantizing weights from 32-bit floats down to 8-bit signed integers (`int8_t`

), the core weights fit in exactly 96 bytes.

First training run looked great on paper. Wrong. During validation, the model output the exact same value for every input. Because the raw dataset was skewed 3:1 toward fire events, the network found the easy path: freeze the biases, say "Alarm" constantly, collect the accuracy applause from the loss function.

The fix: aggressive data shuffle, class balancing in the preprocessor, and a lower learning rate to keep weights out of the dead zones of the Sigmoid activation.

A subtler problem took longer to catch. The CNT column — a sequential row counter — was included as a feature in early runs. The network learned to use position in the dataset as a proxy for class membership. The results looked excellent. They were meaningless. CNT was dropped and the entire preprocessing pipeline was rebuilt from scratch.

Training was done with **Hasaki 刃先 v3.2.0** — a CLI C++ tool for desktop neural network training that exports standalone C headers for firmware deployment. No Python, no runtime, no dependencies.

Configuration:

The model converged at **epoch 2,013**. Final validation loss: **0.000008**.

After training, we evaluated the model against a held-out **test set of 7,150 samples** — data the model never touched during training or early stopping.

```
Confusion Matrix:
[3547    4]
[   1 3598]

Accuracy: 99.9301%
```

Breaking this down:

That last number. One missed fire event out of 3,599. **99.97% sensitivity.**

In a smoke detector, a false positive wakes someone up at 2 AM. A false negative lets a fire go undetected. This network achieved 99.97% sensitivity on data it had never seen, running in 3.7 kB of Flash.

The end product is a single, self-contained C header: `alarm_model.h`

. The entire file — weights, biases, quantization scales, activation functions, and inference code — is 3.7 kB on disk.

Generated by Hasaki v3.2.0:

```
#ifndef ALARM_MODEL_H
#define ALARM_MODEL_H

#include <math.h>
#include <stdint.h>

// Generated by hasaki v3.2.0
// Architecture: 12-8-4-1 | Quantization: int8

static inline float relu(float x)    { return x > 0.0f ? x : 0.0f; }
static inline float sigmoid(float x) { return 1.0f / (1.0f + expf(-x)); }

// 96 Quantized Weights mapping the 12 sensors
const int8_t w1[8][12] = { ... };
const float  b1[8]     = { ... };
const float  w1_scales[8] = { ... };

// ... Layer 2 and 3 definitions ...

static inline void predict(const float* input, float* output) {
    float a[8], b[8];
    // Layer 1: INT8 Symmetric Quantized Dot Product + Scale + ReLU
    for (int i = 0; i < 8; i++) {
        float dot = 0.0f;
        for (int j = 0; j < 12; j++) dot += (float)w1[i][j] * input[j];
        a[i] = relu(b1[i] + dot * w1_scales[i]);
    }
    // Layer 2
    for (int i = 0; i < 4; i++) {
        float dot = 0.0f;
        for (int j = 0; j < 8; j++) dot += (float)w2[i][j] * a[j];
        b[i] = relu(b2[i] + dot * w2_scales[i]);
    }
    // Layer 3: Sigmoid output
    float dot = 0.0f;
    for (int j = 0; j < 4; j++) dot += (float)w3[0][j] * b[j];
    output[0] = sigmoid(b3[0] + dot * w3_scales[0]);
}

#endif
```

When compiled, the model adds roughly 2.3 kB to the microcontroller's Flash. Most of that is linking `expf`

from `libm`

for the Sigmoid. Grow the network — you don't pay this cost again.

No heap allocations. No `malloc`

. No dynamic object trees. Flat array indexing that runs in microseconds.

The header drops directly into any C/C++ firmware. The `predict()`

function writes its result into an output array — one float per output neuron.

```
#include "alarm_model.h"

#define ALARM_PIN         3
#define TRIGGER_THRESHOLD 3

// Normalization ranges from training data
const float INPUT_MIN[12] = {-22.01f, 10.74f,    0.0f,    400.0f,
                              10668.0f, 15317.0f, 930.852f, 0.0f,
                              0.0f,     0.0f,     0.0f,     0.0f};
const float INPUT_MAX[12] = { 59.93f,  75.2f,  60000.0f, 60000.0f,
                              13803.0f, 21410.0f, 939.861f, 14333.69f,
                              45432.26f, 61482.03f, 51914.68f, 30026.438f};

void loop() {
    // 1. Read sensors
    float input[12] = {
        read_temp(), read_hum(),      read_tvoc(),     read_eco2(),
        read_rawh2(), read_ethanol(), read_pressure(),
        read_pm1(),  read_pm25(),    read_nc05(), read_nc1(), read_nc25()
    };

    // 2. Normalize to [0.0, 1.0]
    for (int i = 0; i < 12; i++)
        input[i] = (input[i] - INPUT_MIN[i]) / (INPUT_MAX[i] - INPUT_MIN[i]);

    // 3. Inference
    float output[1];
    predict(input, output);

    // 4. State accumulator — 3 consecutive positives required to trigger
    static int accumulator = 0;
    if (output[0] >= 0.5f) {
        if (accumulator < TRIGGER_THRESHOLD) accumulator++;
    } else {
        if (accumulator > 0) accumulator--;
    }

    // 5. Drive alarm pin
    digitalWrite(ALARM_PIN, accumulator >= TRIGGER_THRESHOLD ? HIGH : LOW);

    delay(1000); // 1 Hz — matches dataset sample rate
}
```

The state accumulator requires 3 consecutive predictions above 0.5 before triggering the alarm, and decrements on clean readings. A real fire doesn't disappear in 3 seconds. An isolated sensor spike does. With 99.97% sensitivity, 3 consecutive false negatives is essentially impossible.

When something runs slow or takes too much space, the reflex is to upgrade the chip, buy more cloud credits, or add RAM.

We wrote our own parser, caught two data bugs before they made it into print, and ended up with a 3.7 kB inference engine running on hardware that costs less than a cup of coffee. The sensor fusion approach Stefan Blattmann set out to validate — that multi-variable chemical signatures outperform single-sensor smoke detectors — holds up under scrutiny.

The silicon was never the constraint.

*Nasaki 刃先 companion repository available at github.com/AlexRosito67/hasaki-smoke-detector*