# We built a free status monitor for 77 AI APIs. Here's what 6 weeks of data taught us.

> Source: <https://dev.to/max_98b3db49c06de66802dcd/we-built-a-free-status-monitor-for-77-ai-apis-heres-what-6-weeks-of-data-taught-us-56ko>
> Published: 2026-06-22 09:59:03+00:00

Every AI developer has been here: your app is throwing 503s, users are pinging you, and you have 12 browser tabs open — OpenAI status page, Anthropic status page, the GitHub Copilot health page, three different Discord servers — trying to figure out *is this me or is it them?*

That's the problem we set out to solve. [Prismix](https://prismix.dev) aggregates status from 77 AI services in one place. Six weeks of running it in production taught us some things that might save you time.

AI APIs don't fail like traditional infrastructure. They fail in weird, partial ways:

The official status pages are optimistic by design. They're customer-facing communications tools, not real-time engineering dashboards. There's nothing wrong with this — but it means you need a different mental model for "is this service down?"

When you watch 77 AI services simultaneously, patterns emerge fast.

**OpenAI** is the most-watched service (and has the most incidents to watch). The pattern is almost always the same: `investigating`

→ `identified`

→ `monitoring`

→ `resolved`

, typically in 45–90 minutes. The `investigating`

phase is where most developers panic — it looks bad but usually resolves without action on your end.

**Anthropic** runs noticeably clean compared to its API usage growth. Incidents are rarer and shorter. When they do happen, updates arrive faster than most providers.

**The long tail is interesting.** Services like Replicate, Runway, ElevenLabs, and Suno have incident patterns that don't correlate with OpenAI at all. If you're routing across multiple providers for redundancy, these are genuinely independent failure domains — worth knowing.

**The "silent degradation" problem is real.** Multiple times we've seen a service show "operational" on its status page while our uptime probe was timing out. This is the main reason Prismix shows a latency sparkline per service — the status page is authoritative for *announced* incidents, but the probe catches *real* ones.

[Prismix](https://prismix.dev) pulls from official status pages, aggregates them into a single dashboard, and adds a few things that the individual pages don't have:

**Per-service latency probes** — 24-hour sparklines showing actual response times, not just announced incidents. This catches the "silent degradation" cases.

**Cross-service incident timeline** — `/incidents`

shows everything that happened across all 77 services in one scrollable feed. Useful for postmortems ("was anything else degraded when our error rate spiked at 3pm Tuesday?").

**Embeddable status badges** — put a live "OpenAI: operational" badge in your own app's status page with one line of HTML.

**Public REST API** — `GET /api/v1/statuses`

returns current status for all 77 services as JSON. No auth, no rate limit for reasonable use, CORS open. Free forever.

**RSS feed** — `/incidents.rss`

if you want AI incident updates in your feed reader.

It's free because it runs entirely on Cloudflare's free tier (Workers + KV). The Pro tier ($10/mo) adds email and webhook alerts for services you care about, but the core dashboard stays free.

The stack is Astro 5 SSR + Cloudflare Workers + KV. We wrote about the performance walls we hit [in a previous post](https://dev.to/max_98b3db49c06de66802dcd/4-perf-walls-i-hit-shipping-an-ai-hub-on-cloudflare-workers-kv-246) — the short version is that 77 parallel KV reads per request is a bad idea and a single pre-aggregated snapshot blob is much better.

One thing that surprised us: KV's free tier gives you 100,000 *reads* per day but only 1,000 *writes*. The cron job that refreshes status runs every 5 minutes, so every write is conditional — only write if the content actually changed. That dropped writes from ~8,400/day to ~600/day. Monitoring infrastructure has to be cheap to run, otherwise the incentive to keep it free disappears.

Six weeks in, Prismix tracks 77 services with a clean incident timeline and growing usage. What we don't have yet is signal on what matters to *you*.

Some things we're genuinely uncertain about:

If any of that resonates, drop a comment. Honest feedback shapes what gets built next.

Live at [prismix.dev](https://prismix.dev).

*Also at Prismix: an MCP server directory with 500+ servers and a curated AI news feed — but the status monitoring is the part we're most curious to hear about.*
