{"slug": "we-built-a-free-status-monitor-for-77-ai-apis-here-s-what-6-weeks-of-data-taught", "title": "We built a free status monitor for 77 AI APIs. Here's what 6 weeks of data taught us.", "summary": "Prismix, a free status monitor for 77 AI APIs, has been running for six weeks and reveals that AI APIs fail in partial, non-traditional ways. OpenAI has the most incidents, typically resolving in 45–90 minutes, while Anthropic runs cleaner with rarer and shorter incidents. The tool also exposes a 'silent degradation' problem where services show operational on status pages but time out on probes.", "body_md": "Every AI developer has been here: your app is throwing 503s, users are pinging you, and you have 12 browser tabs open — OpenAI status page, Anthropic status page, the GitHub Copilot health page, three different Discord servers — trying to figure out *is this me or is it them?*\n\nThat's the problem we set out to solve. [Prismix](https://prismix.dev) aggregates status from 77 AI services in one place. Six weeks of running it in production taught us some things that might save you time.\n\nAI APIs don't fail like traditional infrastructure. They fail in weird, partial ways:\n\nThe official status pages are optimistic by design. They're customer-facing communications tools, not real-time engineering dashboards. There's nothing wrong with this — but it means you need a different mental model for \"is this service down?\"\n\nWhen you watch 77 AI services simultaneously, patterns emerge fast.\n\n**OpenAI** is the most-watched service (and has the most incidents to watch). The pattern is almost always the same: `investigating`\n\n→ `identified`\n\n→ `monitoring`\n\n→ `resolved`\n\n, typically in 45–90 minutes. The `investigating`\n\nphase is where most developers panic — it looks bad but usually resolves without action on your end.\n\n**Anthropic** runs noticeably clean compared to its API usage growth. Incidents are rarer and shorter. When they do happen, updates arrive faster than most providers.\n\n**The long tail is interesting.** Services like Replicate, Runway, ElevenLabs, and Suno have incident patterns that don't correlate with OpenAI at all. If you're routing across multiple providers for redundancy, these are genuinely independent failure domains — worth knowing.\n\n**The \"silent degradation\" problem is real.** Multiple times we've seen a service show \"operational\" on its status page while our uptime probe was timing out. This is the main reason Prismix shows a latency sparkline per service — the status page is authoritative for *announced* incidents, but the probe catches *real* ones.\n\n[Prismix](https://prismix.dev) pulls from official status pages, aggregates them into a single dashboard, and adds a few things that the individual pages don't have:\n\n**Per-service latency probes** — 24-hour sparklines showing actual response times, not just announced incidents. This catches the \"silent degradation\" cases.\n\n**Cross-service incident timeline** — `/incidents`\n\nshows everything that happened across all 77 services in one scrollable feed. Useful for postmortems (\"was anything else degraded when our error rate spiked at 3pm Tuesday?\").\n\n**Embeddable status badges** — put a live \"OpenAI: operational\" badge in your own app's status page with one line of HTML.\n\n**Public REST API** — `GET /api/v1/statuses`\n\nreturns current status for all 77 services as JSON. No auth, no rate limit for reasonable use, CORS open. Free forever.\n\n**RSS feed** — `/incidents.rss`\n\nif you want AI incident updates in your feed reader.\n\nIt's free because it runs entirely on Cloudflare's free tier (Workers + KV). The Pro tier ($10/mo) adds email and webhook alerts for services you care about, but the core dashboard stays free.\n\nThe stack is Astro 5 SSR + Cloudflare Workers + KV. We wrote about the performance walls we hit [in a previous post](https://dev.to/max_98b3db49c06de66802dcd/4-perf-walls-i-hit-shipping-an-ai-hub-on-cloudflare-workers-kv-246) — the short version is that 77 parallel KV reads per request is a bad idea and a single pre-aggregated snapshot blob is much better.\n\nOne thing that surprised us: KV's free tier gives you 100,000 *reads* per day but only 1,000 *writes*. The cron job that refreshes status runs every 5 minutes, so every write is conditional — only write if the content actually changed. That dropped writes from ~8,400/day to ~600/day. Monitoring infrastructure has to be cheap to run, otherwise the incentive to keep it free disappears.\n\nSix weeks in, Prismix tracks 77 services with a clean incident timeline and growing usage. What we don't have yet is signal on what matters to *you*.\n\nSome things we're genuinely uncertain about:\n\nIf any of that resonates, drop a comment. Honest feedback shapes what gets built next.\n\nLive at [prismix.dev](https://prismix.dev).\n\n*Also at Prismix: an MCP server directory with 500+ servers and a curated AI news feed — but the status monitoring is the part we're most curious to hear about.*", "url": "https://wpnews.pro/news/we-built-a-free-status-monitor-for-77-ai-apis-here-s-what-6-weeks-of-data-taught", "canonical_source": "https://dev.to/max_98b3db49c06de66802dcd/we-built-a-free-status-monitor-for-77-ai-apis-heres-what-6-weeks-of-data-taught-us-56ko", "published_at": "2026-06-22 09:59:03+00:00", "updated_at": "2026-06-22 10:09:56.378504+00:00", "lang": "en", "topics": ["artificial-intelligence", "developer-tools", "ai-infrastructure"], "entities": ["Prismix", "OpenAI", "Anthropic", "Cloudflare", "Replicate", "Runway", "ElevenLabs", "Suno"], "alternates": {"html": "https://wpnews.pro/news/we-built-a-free-status-monitor-for-77-ai-apis-here-s-what-6-weeks-of-data-taught", "markdown": "https://wpnews.pro/news/we-built-a-free-status-monitor-for-77-ai-apis-here-s-what-6-weeks-of-data-taught.md", "text": "https://wpnews.pro/news/we-built-a-free-status-monitor-for-77-ai-apis-here-s-what-6-weeks-of-data-taught.txt", "jsonld": "https://wpnews.pro/news/we-built-a-free-status-monitor-for-77-ai-apis-here-s-what-6-weeks-of-data-taught.jsonld"}}