Failure numbers every programmer should know

A blog post titled "Failure numbers every programmer should know" provides reliability napkin math for hardware, cloud services, and software defects. The post is part of a collection that also covers topics including AI datacenter aesthetics, GPU monitoring, and cloud-era interpretations of Gray's '5 minute rule.' The article aims to equip programmers with essential failure rate benchmarks for building more resilient systems.

Blog safetykit A small collection of safety demos for human-in-the-loop scripts. Failure numbers every programmer should know Reliability napkin math for hardware, cloud services, and software defects. Can an AI datacenter be beautiful? Intelligence too cheap to meter. Industry too cheap to inspire. There's been a vibe shift in vibe coding Senior's eyes are lighting up in delight. 20,000 healthy GPUs How Modal does active and passive monitoring on hyperscalers and neoclouds. The 10 best software podcast episodes I ever heard Highly recommended listens from a podcast addict. Larval stage support engineering: great at what doesn’t scale The three core mantras of early support engineering success. "A Foundational Result in Machine Learning" Diving into single-layer perceptrons and basic information theory. Gray’s ‘5 minute rule’ in the cloud era If an item is accessed frequently enough, it should be main memory resident. For 1987 technology, “frequently enough” means about every five minutes. Aussie engineers, get to The States The E3 is for you and me. More Than DNS: The 14 hour AWS us-east-1 outage A thorough review of a major cloud outage. You should have private evals Everybody should have a personal set of test prompts to try on LLMs.