{"slug": "how-i-got-a-threat-classification-ai-running-on-agent-in-under-8ms-no-gpu-no", "title": "How I got a threat-classification AI running on-agent in under 8ms — no GPU, no cloud", "summary": "Watch Cortex, a threat-classification AI, runs on-agent in under 8ms without cloud calls or GPUs. The system uses a gradient-boosted decision tree ensemble for primary classification and a lightweight autoencoder for anomaly detection, enabling sub-10ms inference on CPU. Feature engineering with ~140 features per event provides stateful context for accurate threat detection.", "body_md": "When I tell people that Watch Cortex classifies threats in under 8ms on-agent — no cloud call, no GPU, no round-trip — the first question is usually: *how?*\n\nThe second question is: *why bother? Just send it to the cloud.*\n\nLet me answer the second one first, because it explains all the engineering decisions that follow.\n\nThe cloud-call model for security agents has a fundamental problem: it fails when you need it most.\n\nNetwork incidents, backend outages, high-latency connections — all of these happen. And they correlate with attacks. An attacker who can disrupt your monitoring before escalating isn't a theoretical threat; it's a documented technique (T1562.001 in MITRE ATT&CK).\n\nIf your security agent phones home and gets no answer, you're flying blind during an attack. That's not a tradeoff I'm willing to make.\n\nBeyond reliability: latency. A cloud round-trip is 50-200ms under good conditions. That's an eternity in an SSH brute-force sequence. Cortex needs to classify and respond before the attacker's next attempt lands — sub-second total, which means the classification step has to be under 10ms.\n\nSo: on-agent, <8ms, no GPU. Those were the constraints. Here's how I built to them.\n\nFirst, let's be precise about what Cortex is doing. It's not doing NLP. It's not running a large model. It's doing **behavioral event classification** — looking at structured telemetry events and deciding: is this a threat, and if so, what kind?\n\nInput: a stream of structured events — process forks, network connections, file writes, auth attempts — with context (parent process, timestamp, user, path, connection direction).\n\nOutput: a threat classification with confidence score, threat category, and recommended response action.\n\nThat framing changes the problem significantly. I'm not asking \"what does this log line mean in English?\" I'm asking \"does this pattern of events match known attack behavior?\"\n\nCortex uses a **gradient-boosted decision tree ensemble** (XGBoost, specifically) for the primary classifier, with a lightweight neural layer for anomaly scoring on top.\n\nWhy GBT instead of a neural network?\n\n**Inference speed.** A well-tuned XGBoost model with ~200 trees classifies a feature vector in under 1ms on a modern CPU. Neural networks at equivalent accuracy are 10-50x slower for structured tabular data.\n\n**No GPU required.** GBT inference is pure CPU arithmetic — matrix multiplications over narrow feature vectors. An EC2 t3.micro can run it comfortably alongside the monitoring agent without noticeable CPU impact.\n\n**Explainability.** SHAP values let me tell the operator *exactly* which features drove the classification. That's how Cortex generates plain-language investigation summaries — not LLM-generated prose, but template-filled explanations grounded in feature importance scores.\n\n**Small model size.** The serialized Cortex model is ~1.2MB. It ships with the agent binary, pre-synced. No cold-start, no download-on-first-use.\n\nThe anomaly layer is a small autoencoder (3 layers, ~15K parameters) that learns each server's baseline behavior over the first 72 hours. It flags events that deviate from that baseline even when they don't match known attack patterns. This is what catches novel techniques that the GBT hasn't been trained on.\n\nThe model is the easy part. Feature engineering is where I spent 80% of the time.\n\nRaw events are useless to a classifier. What matters is the *context* around an event — the temporal patterns, the process ancestry, the prior history of the entities involved.\n\nCortex computes ~140 features per event. A few illustrative examples:\n\n**Process ancestry features:**\n\n**Network features:**\n\n**Temporal features:**\n\n**File integrity features:**\n\nThe key insight: most of these features require **stateful context**, not just the current event. The agent maintains an in-memory state store — process tables, connection history, auth attempt logs, file write history — that the feature extractor queries in microseconds. This is why the agent runs as a persistent daemon rather than a per-event script.\n\nHere's where the time actually goes:\n\n| Step | Time |\n|---|---|\n| Event receipt from kernel (eBPF probe) | ~0.1ms |\n| State store lookup + feature extraction | ~1.5ms |\n| GBT inference (XGBoost, 200 trees) | ~0.8ms |\n| Anomaly score (autoencoder) | ~1.2ms |\n| Threat category resolution + confidence calibration | ~0.3ms |\n| Response decision + action dispatch | ~0.5ms |\n| SHAP explanation generation | ~3.5ms |\nTotal |\n~8ms |\n\nSHAP generation is surprisingly expensive — it's the largest chunk. In a future version I may cache SHAP values for common event types and only run full SHAP on novel patterns. But 8ms total is fast enough that I haven't prioritized it.\n\nThe eBPF kernel probes are the other interesting piece. Cortex uses a small eBPF program (compiled with libbpf) attached to kprobes for `execve`\n\n, `connect`\n\n, `openat`\n\n, and a handful of others. The probe captures the raw event and writes it to a ring buffer; the userspace agent reads the ring buffer in a tight loop. This gives sub-millisecond event delivery from kernel to userspace — much faster than reading audit logs from `/var/log/audit/`\n\n.\n\nA model is only as good as its training data, and training data for Linux attack behavior is genuinely hard to get.\n\nI ended up with four sources:\n\n**Public datasets.** DARPA VAST, CERT Insider Threat, CIC-IDS2017/2018. These are academic datasets with labeled attack traffic. Useful for broad coverage, but they're old and the attack patterns don't match modern techniques.\n\n**Honeypots.** I run a small fleet of intentionally vulnerable Linux VMs (minimal hardening, weak SSH passwords) exposed to the public internet. They get attacked constantly. I log everything and use it as labeled attack data after manual review.\n\n**Red team exercises.** I've run controlled red team scenarios against test VMs — mimicking common MITRE ATT&CK techniques — and captured the resulting telemetry as positive training examples.\n\n**Production negatives.** Telemetry from normal server operation — cron jobs, package installs, legitimate SSH sessions, monitoring agents — gives me the negative class (normal behavior). This is the largest portion of the training set by volume.\n\nThe hardest problem: class imbalance. In production, attacks are rare events. A naive classifier learns to just say \"not attack\" and achieves 99.9% accuracy, which is useless. Cortex uses SMOTE oversampling on the minority class during training, plus a heavily tuned decision threshold that optimizes for false-negative minimization rather than accuracy. I'd rather have a false positive (unnecessary alert) than a false negative (missed attack).\n\nWhen Cortex detects and confirms a novel threat pattern on one agent, it extracts a compact threat signature: a vector of the most discriminative features that characterized the attack.\n\nThis signature is broadcast to all other agents in the fleet over an encrypted WebSocket connection to the backend, which fans it out immediately. Each receiving agent adds the signature to its local threat library.\n\nThe signature is not the full model — it's a set of rules derived from feature importance: \"if source IP is in this /24, and auth failure rate exceeds X/min, and targeted usernames include 'admin' or 'root', classify as brute force with 0.95 confidence.\"\n\nThese derived rules are fast to evaluate — microseconds, not milliseconds — and supplement the GBT classifier for known-active attack campaigns.\n\nWhen a human operator corrects a Cortex decision (false positive or false negative), the correction is also broadcast fleet-wide. The correction adjusts the confidence calibration for that threat category and, if it's a false positive on a specific process/path combination, adds it to a server-specific allowlist that propagates to similar servers in the fleet (matched by OS version and installed packages).\n\nA few things I had to unlearn:\n\n**I started with a larger model.** My first attempt used a 1,000-tree ensemble with deeper trees and more features. It was more accurate on benchmarks. It was also 40ms inference time, which broke the latency requirement. Ruthlessly pruning to 200 shallower trees while maintaining accuracy was a week of work.\n\n**I underestimated feature extraction time.** I assumed feature extraction was trivial. It's not — especially the temporal features that require querying rolling windows over the state store. Most of my latency wins came from optimizing the state store (switched from SQLite to a hand-rolled ring-buffer structure in memory) rather than the model itself.\n\n**I tried to make the model explain itself in prose.** My first attempt at investigation summaries used a small language model to generate natural-language explanations from the feature values. It added 50ms and the explanations were worse than what I ended up with: structured templates filled in by SHAP feature importance. \"High-frequency SSH auth failures from new IP (3,800 attempts / 4 min)\" is more useful than a paragraph.\n\nA few things on the roadmap:\n\n**Per-server model fine-tuning.** Right now Cortex ships one global model and adapts at inference time using the anomaly layer. Long-term, I want to fine-tune the GBT on each server's specific behavior profile after a 30-day baseline period.\n\n**eBPF program hot-reload.** Currently, updating the kernel probes requires an agent restart. I'm working on a mechanism to push updated eBPF programs without dropping the ring buffer or interrupting monitoring.\n\n**Threat intelligence federation.** Beyond fleet immune memory, I'm looking at integrating with external threat intel feeds (VirusTotal, AbuseIPDB, Shodan) to supplement the classifier's context for external IPs and file hashes.\n\nIf you're building something in this space — autonomous security agents, on-device ML inference, eBPF-based monitoring — I'm happy to trade notes. Drop a comment or reach out directly.\n\n[Watch Cortex](https://watch.alsopss.com) — 14-day free trial, no credit card.\n\n*Built by AL'S-OPS LLC. Feedback and security disclosures: security@alsopss.com.*", "url": "https://wpnews.pro/news/how-i-got-a-threat-classification-ai-running-on-agent-in-under-8ms-no-gpu-no", "canonical_source": "https://dev.to/alsops/how-i-got-a-threat-classification-ai-running-on-agent-in-under-8ms-no-gpu-no-cloud-4cge", "published_at": "2026-06-15 18:16:36+00:00", "updated_at": "2026-06-15 18:36:46.885392+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "ai-agents", "ai-safety", "ai-infrastructure"], "entities": ["Watch Cortex", "XGBoost", "MITRE ATT&CK", "SHAP", "EC2 t3.micro"], "alternates": {"html": "https://wpnews.pro/news/how-i-got-a-threat-classification-ai-running-on-agent-in-under-8ms-no-gpu-no", "markdown": "https://wpnews.pro/news/how-i-got-a-threat-classification-ai-running-on-agent-in-under-8ms-no-gpu-no.md", "text": "https://wpnews.pro/news/how-i-got-a-threat-classification-ai-running-on-agent-in-under-8ms-no-gpu-no.txt", "jsonld": "https://wpnews.pro/news/how-i-got-a-threat-classification-ai-running-on-agent-in-under-8ms-no-gpu-no.jsonld"}}