{"slug": "i-trained-a-neural-network-on-real-medical-data-and-fit-it-in-8-9-kb-of-pure-c", "title": "I Trained a Neural Network on Real Medical Data and Fit It in 8.9 kB of Pure C", "summary": "An engineer trained a neural network on 3,658 real patient records from the Framingham Heart Study and fit it into 8.9 kB of pure C code using Hasaki, a command-line tool for embedded systems. The model predicts systolic blood pressure with a mean absolute error of approximately 7.8 mmHg, achieving clinical-grade accuracy on a single test sample. The project demonstrates regression capabilities in Hasaki, which was previously only tested on classification tasks.", "body_md": "I want to be honest with you from the start: I didn't set out to write this article.\n\nI set out to answer a question I had about my own tool — *can Hasaki do regression?* — and somewhere between loading a dataset of 3,658 real patient records and watching a microcontroller predict blood pressure from 14 numbers, I realized the answer deserved to be written down.\n\nHasaki is a command-line neural network trainer I built for embedded systems developers. You train a model on your desktop, and it exports a self-contained C header — weights, biases, and a `predict()`\n\nfunction — ready to drop into any MCU project. No TensorFlow. No runtime. No dependencies.\n\nEvery public demo I've done has been classification: smoke detection, MNIST digits, motion sensing. Binary or multiclass outputs. But Hasaki has a `linear`\n\nactivation. And `sigmoid`\n\n. And a single-neuron output layer. The pieces for regression were always there.\n\nI just never tested them.\n\nSo I decided to test them properly — with real data, a real target, and enough honesty to document what broke along the way.\n\nI wanted something biomédical. Not because I'm building a medical device — I'm not, and I'll be clear about that — but because medical data carries weight. It's messy, it's human, and if the numbers are wrong, they mean something.\n\nThe Framingham Heart Study dataset on Kaggle fit the criteria: 4,240 patient records with 15 attributes — age, cholesterol, BMI, glucose, smoking habits, blood pressure — collected over decades from residents of Framingham, Massachusetts. The target most people use is `TenYearCHD`\n\n, a binary classification of 10-year heart disease risk.\n\nI ignored that. I went after `sysBP`\n\n— systolic blood pressure — a continuous value ranging from 83.5 to 295.0 mmHg. That's a regression problem.\n\nBefore Hasaki could see a single number, I had to clean the data.\n\n645 rows had missing values — about 15% of the dataset. I dropped them. That left 3,658 samples, which is workable.\n\nThen I had to normalize everything. Hasaki expects inputs in a consistent range. Raw features don't come that way: age runs 32–70, cholesterol 107–696, glucose 40–394. I used `RobustScaler`\n\nfrom scikit-learn — more resistant to outliers than `MinMaxScaler`\n\n, which matters when you have 3,658 people's real health data.\n\nThe target needed normalization too. With `sigmoid`\n\nas the output activation, predictions are bounded to `[0, 1]`\n\n. So I normalized `sysBP`\n\nto that range:\n\n```\ny_scaled = (target - target_min) / (target_max - target_min)\n# target_min = 83.5, target_max = 295.0\n```\n\nThis is the part that frustrates me about my own tool. A new user shouldn't need to know any of this. They should hand Hasaki a CSV and get a model. That's a v4.0.0 problem — and it's already on the roadmap.\n\nArchitecture: `14,32,16,1`\n\n— fourteen inputs, two hidden layers, one output.\n\n```\nhasaki -d 14,32,16,1 -act relu,relu,sigmoid -a train \\\n      -f framingham_hasaki.csv -e 10000 -l 0.001 --adam \\\n      -o bp_model.txt\n```\n\nIt trained in 23 seconds. Early stopping triggered at epoch 218.\n\n**Final Val Loss: 0.002420**\n\nThat number lives in normalized space. To make it mean something:\n\n```\nRMSE = sqrt(0.002420) × (295.0 - 83.5) ≈ 10.4 mmHg\n```\n\nThe mean absolute error across the validation set was approximately **7.8 mmHg**.\n\nFor a single sample — a male patient, age 39, non-smoker, no diabetes, cholesterol 195, BMI 26.97 — the model predicted **107.4 mmHg**. The real value was **106.0 mmHg**. Error: **1.4 mmHg**.\n\nClinical-grade blood pressure monitors are rated at ±3 mmHg.\n\nThere's something I need to flag for anyone using Hasaki for regression: the training log shows `Train Acc`\n\nand `Val Acc`\n\nalongside the loss values. During my regression run, both read `1.0000`\n\nfrom the very first epoch.\n\nThat's not because the model is perfect. It's because the accuracy metric uses a 0.5 threshold designed for binary classification — and in regression, nearly every prediction clears that bar trivially.\n\nThe accuracy columns are meaningless for regression. Ignore them. Watch Val Loss.\n\nThis is a known issue and it's going in the fix list for v4.0.0.\n\nAfter validating, I ran the INT8 quantization test:\n\n```\nResult: PASSED\nMean error (float):  0.037270\nMean error (INT8):   0.037251\n```\n\nPractically identical. I exported:\n\n```\nhasaki -m bp_model.txt -a export -o bp_model.h -q int8\n```\n\n**File size: 8,893 bytes — 8.9 kB.**\n\nA model that predicts systolic blood pressure from 14 patient vitals, quantized to INT8, fits in 8.9 kB of flash. No runtime. No interpreter. No Python at inference time. Just a C header and a `predict()`\n\ncall.\n\n```\n#include \"bp_model.h\"\n\nfloat input[14] = { /* normalized vitals */ };\nfloat output[1];\n\npredict(input, output);\n\nfloat systolic_bp = output[0] * (295.0f - 83.5f) + 83.5f;\n```\n\nI built Hasaki for classification. I documented it for classification. Every demo I've published has been classification.\n\nRegression was never a stated feature. It was a consequence of having `linear`\n\nand `sigmoid`\n\nactivations and a single output neuron — the right pieces in the right place.\n\nI only found out it worked by testing it.\n\nThat surprised me more than the 1.4 mmHg prediction. Not because the math is surprising — a neural network doesn't care if the target is a class label or a blood pressure reading. But because the tool had a capability I didn't know it had until I used it outside its intended purpose.\n\nThere's something worth sitting with there. We tend to define our tools by the problems we built them for. But a good tool, used honestly, sometimes reveals what it can do before we think to ask.\n\nI didn't know Hasaki could do regression.\n\nNow it's in the manual.\n\nThis is a research experiment, not a medical device. The model was trained on the Framingham Heart Study dataset for demonstration purposes only. Do not use Hasaki-trained models for clinical diagnosis without independent validation by qualified medical professionals.\n\n*Hasaki 刃先 — Small tools. Sharp inference.*\n\n[Hasaki 刃先 Free](https://github.com/AlexRosito67/hasaki) · [Hasaki 刃先 Pro](https://hasaki.lemonsqueezy.com)", "url": "https://wpnews.pro/news/i-trained-a-neural-network-on-real-medical-data-and-fit-it-in-8-9-kb-of-pure-c", "canonical_source": "https://dev.to/alexrosito67/i-trained-a-neural-network-on-real-medical-data-and-fit-it-in-89-kb-of-pure-c-4e40", "published_at": "2026-06-20 04:16:08+00:00", "updated_at": "2026-06-20 05:07:07.375842+00:00", "lang": "en", "topics": ["machine-learning", "neural-networks", "developer-tools"], "entities": ["Hasaki", "Framingham Heart Study", "Kaggle", "RobustScaler", "scikit-learn"], "alternates": {"html": "https://wpnews.pro/news/i-trained-a-neural-network-on-real-medical-data-and-fit-it-in-8-9-kb-of-pure-c", "markdown": "https://wpnews.pro/news/i-trained-a-neural-network-on-real-medical-data-and-fit-it-in-8-9-kb-of-pure-c.md", "text": "https://wpnews.pro/news/i-trained-a-neural-network-on-real-medical-data-and-fit-it-in-8-9-kb-of-pure-c.txt", "jsonld": "https://wpnews.pro/news/i-trained-a-neural-network-on-real-medical-data-and-fit-it-in-8-9-kb-of-pure-c.jsonld"}}