I skipped the LLM and built a 9-rule deterministic diagnosis engine for my performance monitoring SaaS

A developer built CacheSnap, a performance monitoring SaaS that probes URLs from 8 AWS Lambda regions and uses a 9-rule deterministic diagnosis engine instead of an LLM to identify CDN caching issues. The engine provides actionable fixes in sub-millisecond time, avoiding the latency, correctness, and testability problems of LLM-based approaches.

Most developers don't know if their CDN is actually caching in Tokyo. They check the dashboard, see a green dot, assume everything is fine. Meanwhile, every request from Asia is hitting their origin in Frankfurt because a CDN config never propagated. TTFB is 800ms. Users are leaving. Nobody noticed because the uptime monitor only checks "is the site up?", not "is it fast from where your users actually are?" That's what I built CacheSnap https://cachesnap.com to fix. It probes your URLs from 8 AWS Lambda regions every few minutes and, instead of showing you raw headers, tells you what's wrong and what to do about it. This post is about the two pieces of engineering that make that work: the deterministic diagnosis engine and the Redis-gated scheduler. Before getting into implementation, it helps to understand what we're targeting. When CacheSnap detects a problem, a card like this appears on the dashboard: ⚠ CRITICAL · sa-east São Paulo Cache MISS: origin server is being consulted for every request in sa-east. Action: Add Cache-Control: public, s-maxage=300 to your response headers. For Next.js, use export const revalidate = 300 in your page. Estimated gain: ~450ms No headers to decode. No raw JSON to interpret. The cause is a sentence. The fix is two lines of config. The gain is a number. Getting from raw probe data to that card is the job of the diagnosis engine. A probe returns something like this: TTFB: 480ms Cache-Status: MISS HTTP: HTTP/2 Redirects: 0 Served-By: 87c1d4a2b3c4d5e6-IAD ← Cloudflare CF-Ray header That's a measurement. It tells you what happened, not why or what to do . The gap between "480ms TTFB" and "your CDN isn't caching: here's the exact config line to fix it" is where most monitoring tools stop. The obvious path is to feed the data to an LLM and let it generate the explanation. I spent a week thinking seriously about this and decided against it. Three reasons: 1. Volume and latency. Diagnosis runs on every probe ingest. With 50 monitors × 8 regions × 1-minute intervals, that's 400 diagnosis calls per minute at steady state. An LLM call averaging 800ms would add more latency to the pipeline than the performance problems it's diagnosing. Diagnosis needs to be sub-millisecond. 2. Correctness. An LLM will generate plausible advice regardless of whether it's applicable. It might say "try adding a Cache-Control header" when one already exists and the problem is a CDN misconfiguration. A rule engine is wrong in known, fixable ways: you can write a test for every mistake it makes. 3. Testability. I want diagnose input to be a pure function with deterministic output I can run in CI. The priority between rules is a product decision: "cache MISS beats anycast mismatch" is something I can assert and lock down. With an LLM that's not possible. The alternative: a priority-ordered rule table. Each rule maps an observable condition to a structured diagnosis. Rules evaluate top-to-bottom; first match wins. The core types: pub struct DiagnosisInput { pub ttfb ms: Option<f64 , pub cache status: Option<String , pub baseline ttfb ms: Option<f64 , // 7-day rolling average for this URL + region pub redirect count: Option<i32 , pub http version: Option<String , pub region: String, pub error: Option<String , pub served by: Option<String , // CF-Ray header, used for anycast audit pub age s: Option<i32 , // Age response header } pub struct Diagnosis { pub severity: String, // "critical" | "warning" | "info" | "ok" pub cause: String, pub action: String, pub summary: String, pub estimated gain ms: Option<f64 , } The diagnose function evaluates 9 rules in fixed priority order: 1. Connectivity error → critical site unreachable 2. Cache MISS/BYPASS + TTFB 200ms → critical highest actionable impact 3. TTFB 2× 7-day baseline → warning regression vs. normal 4. Redirect count 1 → warning redirect chain cost 5. Cache HIT but TTFB 150ms → warning slow edge function 6. HTTP/1.1 + TTFB 100ms → info upgrade available 7. Cache Age 86400s → warning stale content risk 8. Anycast routing mismatch → warning cross-region routing 9. fallthrough → ok Here's rule 2 verbatim from the source: if cache.contains "MISS" || cache.contains "BYPASS" || cache.contains "EXPIRED" || cache.contains "DYNAMIC" && ttfb 200.0 { let gain = ttfb - 30.0; // estimate: a cache HIT would cost ~30ms at the edge return Diagnosis { severity: "critical".into , cause: format "Cache {}: origin server is being consulted for every request in {}.", cache, input.region , action: "Add Cache-Control: public, s-maxage=300 to your response headers. \ For Next.js, use export const revalidate = 300 in your page.".into , summary: format "Cache {} in {} is adding ~{:.0}ms", cache, input.region, gain , estimated gain ms: Some gain , }; } The estimated gain ms field is worth pausing on. "Your site is slow" is vague. "Fixing this saves 450ms in São Paulo" is a business case. The number is an estimate actual gain depends on origin latency after fix , but even a rough estimate turns a warning into a prioritizable action. The order isn't arbitrary. Rules 2–4 surface "fix this now" problems. Rules 7–8 are audit signals: real issues, but lower urgency than a production cache miss that's affecting every user right now. Without explicit priority, overlapping signals create ambiguity. A cache MISS and an anycast routing mismatch can both be true simultaneously. The engine needs to surface the most actionable one. The tests lock this down: // Cache MISS must win over anycast mismatch rule 2 rule 8 test fn cache miss beats anycast { let d = diagnose &DiagnosisInput { ttfb ms: Some 500.0 , cache status: Some "MISS".into , served by: Some "abc123-IAD".into , region: "sa-east".into , ..base input } ; assert eq d.severity, "critical" ; assert d.cause.contains "MISS" ; } // Baseline regression must win over stale cache age rule 3 rule 7 test fn baseline anomaly beats cache age breach { let d = diagnose &DiagnosisInput { ttfb ms: Some 800.0 , cache status: Some "HIT".into , baseline ttfb ms: Some 100.0 , // 8× slower than normal age s: Some 172 800 , // content also 2 days old ..base input } ; assert eq d.severity, "warning" ; assert d.cause.contains "7-day baseline" ; } These tests document the intended priority as much as they verify correctness. When I change a rule's position, a failing test tells me exactly what got displaced and forces an explicit decision about whether that's right. Rule 8 is the most unusual and the one users consistently didn't know they had. CDNs use anycast to route requests to the nearest PoP. When it works, a user in São Paulo gets served by GRU or GIG. When something is misconfigured geo-steering rules, load balancer health checks, origin pull settings , the same request travels to IAD Dulles, Virginia instead. That's an extra 100–150ms RTT on every request, invisible unless you're explicitly probing from the right region. Cloudflare exposes which PoP served a request via the CF-Ray header: 87c1d4a2b3c4d5e6-GRU . The suffix is the IATA airport code of the serving PoP. The engine extracts it and checks whether that PoP belongs to the probe's expected region: php pub fn extract iata from served by served by: &str - Option<&str { // CF-Ray format: hexdigest-IATA if let Some pos = served by.rfind '-' { let candidate = &served by pos + 1.. ; if candidate.len = 2 && candidate.len <= 4 && candidate.chars .all |c| c.is ascii uppercase { return Some candidate ; } } None } fn iata to region iata: &str - Option<&'static str { match iata { "GRU" | "GIG" | "EZE" | "SCL" | "BOG" | "LIM" = Some "sa-east" , "IAD" | "JFK" | "EWR" | "ORD" | "LAX" | "DFW" = Some "us-east" , "LHR" | "AMS" | "FRA" | "CDG" | "MXP" | "MAD" = Some "eu-west" , "NRT" | "SIN" | "HKG" | "BOM" | "DEL" | "ICN" = Some "ap-east" , "SYD" | "MEL" | "BNE" | "PER" | "AKL" = Some "oc" , // ... full table covers ~80 codes = None, } } If the probe is from sa-east and the PoP is IAD , that's a mismatch. The diagnosis tells the user exactly which PoP answered and which CDN config to check. Without multi-region probing this problem is nearly impossible to notice. Uptime monitors that check from a single location or from the same region as the CDN PoP will never see it. Rule 3 TTFB anomaly vs. 7-day baseline is where the engine personalizes to your specific URL. A fixed threshold like "warn if TTFB 400ms" is meaningless without context. A CDN-cached static page at 400ms is broken. A database-backed API at 400ms is completely normal. Using a threshold calibrated to what's normal for that URL means the engine warns about actual regressions, not just "slowness" in the abstract. Every 15 minutes, a background worker updates a monitor id, region, mean, stddev record from a 7-day sliding window: SELECT AVG ttfb ms AS mean ttfb, STDDEV ttfb ms AS stddev ttfb FROM probe results WHERE monitor id = $1 AND region = $2 AND time NOW - INTERVAL '7 days' AND ttfb ms IS NOT NULL AND error IS NULL When a new probe comes in, the engine receives baseline ttfb ms and computes the factor. Factor 2.0 → warning. The diagnosis includes the factor, the raw numbers, and the estimated recovery: TTFB is 7.5× slower than your 7-day baseline 600ms vs 80ms normal in eu-west. Action: Check for recent deploys, increased origin load, or cold starts. Estimated gain: ~520ms The first time CacheSnap detects this on a real site it feels like magic. In practice it's just a TimescaleDB window query and a ratio check, but the framing as "your normal" rather than "some threshold" is what makes the alert actionable. The diagnosis engine is fast and pure, but the scheduler that triggers probes has a harder problem: ensuring each monitor fires exactly once per interval across multiple API instances. The naive approach, where each instance tracks last-check time in memory, breaks immediately under horizontal scale. Two instances both fire at t=0 , both record t=300 in memory, both fire again at t=300 . You get duplicate probe rows, double Lambda invocations, and corrupted baselines. The fix is a Redis Lua script that atomically reads and writes the last-check timestamp: local last check = redis.call 'GET', KEYS 1 local warmup index = redis.call 'GET', KEYS 2 local now = tonumber ARGV 1 local interval = tonumber ARGV 2 local effective interval = interval if warmup index ~= false then local idx = tonumber warmup index local warmup gap if idx == 0 then warmup gap = 60 elseif idx == 1 then warmup gap = 60 end if warmup gap and warmup gap < interval then effective interval = warmup gap end end if not last check or now - tonumber last check = effective interval then redis.call 'SET', KEYS 1 , now if not last check then redis.call 'SET', KEYS 2 , '0', 'EX', '300' elseif warmup index ~= false and tonumber warmup index < 2 then redis.call 'INCR', KEYS 2 redis.call 'EXPIRE', KEYS 2 , '300' end return 1 end return 0 Redis executes Lua scripts atomically, so no other command runs between the GET and SET . If the script returns 1, this instance won the race and dispatches the probe. Any other instance evaluating at the same millisecond returns 0 and skips. The warmup logic solves a UX problem: if you add a monitor with a 5-minute interval, you wait 5 minutes before seeing your first data point. The first two probes instead use a 60-second gap warmup index 0 and 1 . By the time you refresh the dashboard, data is already there. The warmup only accelerates: if your configured interval is already shorter than 60s, the gap is ignored. Once should run now returns true, the scheduler dispatches probes to all 8 regions in parallel using Tokio. But unbounded parallelism at scale would instantly 429 every Lambda region: 500 monitors × 8 regions = 4,000 simultaneous invocations per tick. The solution is two Tokio semaphores: one global cap, one per-region cap: pub struct ProbeDispatchLimits { global: Arc<Semaphore , per region: RwLock<HashMap<String, Arc<Semaphore , cfg: Arc<ProbeDispatchConfig , } Before each Lambda POST, the dispatcher acquires both permits with a timeout: js let global permit = match acquire with timeout &self.global, acquire wait .await { Ok p = p, Err = { warn "global semaphore acquire timed out, skipping region for this cycle" ; return; } }; let rsem = self.region semaphore ctx.region id .await; let region permit = match acquire with timeout &rsem, acquire wait .await { Ok p = p, Err = { drop global permit ; return; } }; The timeout is the important part. Without it, a stuck Lambda region holds a permit indefinitely and starves every other monitor waiting for that region. With a timeout, the region is simply skipped for that cycle and retried on the next tick. When Lambda returns 429, both permits are released before the sleep, not after. If we held permits during backoff, we'd block the entire dispatch queue waiting on one throttled region. Bursts are smoothed with a deterministic jitter function no rand crate required : php fn dispatch jitter monitor id: Uuid, region id: &str - Duration { let mut h: u32 = monitor id.as u128 & 0xffff ffff as u32; for b in region id.bytes { h = h.wrapping mul 31 .wrapping add b as u32 ; } Duration::from millis h % 72 as u64 } Same monitor + region always produces the same jitter. The 72ms ceiling means the entire burst window fits within a single scheduler tick, so no probe is ever delayed into the next cycle. Combined, these three pieces give you a system that: For a solo project, keeping the diagnosis deterministic and the scheduler atomic were the two decisions that eliminated the most production incidents. Both feel obvious in retrospect. Neither is what you reach for when moving fast. CacheSnap is live at cachesnap.com https://cachesnap.com . Free tier: 3 monitors, 7-day data retention, 10-minute check intervals. No credit card required. If you want to see what the diagnosis engine finds on your own URLs, it takes about 90 seconds to add a monitor and see the first results. Questions about the architecture, the Redis scheduling approach, or the Lambda probe design? Happy to discuss in the comments.