cd /news/ai-infrastructure/graceful-degradation-circuit-breaker… Β· home β€Ί topics β€Ί ai-infrastructure β€Ί article
[ARTICLE Β· art-24864] src=dev.to β†— pub= topic=ai-infrastructure verified=true sentiment=Β· neutral

Graceful Degradation: Circuit Breakers for External API Dependencies

The HelperX team built a circuit breaker system to prevent cascading failures when external API dependencies go down. The system monitors failures for each dependency per slot, automatically opening circuits after three failures and attempting recovery after a two-minute reset timeout. This ensures that a single failed proxy or API endpoint degrades only the affected slot rather than taking down all 200 healthy slots.

read5 min publishedJun 12, 2026

When your application depends on external APIs that you don't control, failures are not a question of "if" but "when." X's API rate-limits you. Your proxy provider has an outage. The AI model endpoint returns 503s for 20 minutes.

The question is: does one failure cascade into total system failure, or does your system degrade gracefully?

We built a circuit breaker system for HelperX that keeps healthy slots running when unhealthy ones fail. Here's the implementation.

Without circuit breakers, here's what happens when a proxy goes down:

One dead proxy degrades the entire system. With 200 slots, one bad proxy shouldn't affect 199 healthy ones.

A circuit breaker sits between your application and an external dependency. It has three states:

     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚  CLOSED  β”‚ ← Normal operation. Requests pass through.
     β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
          β”‚ failures >= threshold
          β–Ό
     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
     β”‚   OPEN   β”‚ ← Requests fail immediately. No network calls.
     β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
          β”‚ after resetTimeout
          β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ HALF-OPEN β”‚ ← Allow one test request through.
    β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
          β”‚
    β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
    β”‚ success?   β”‚
    β”œβ”€yes───────────► CLOSED (resume normal)
    └─noβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜β”€β”€β–Ί OPEN (wait longer)
class CircuitBreaker {
  constructor(name, options = {}) {
    this.name = name;
    this.state = 'closed';
    this.failures = 0;
    this.successes = 0;
    this.lastFailure = null;
    this.lastAttempt = null;

    this.threshold = options.threshold || 5;
    this.resetTimeout = options.resetTimeout || 60_000;
    this.halfOpenMax = options.halfOpenMax || 1;
    this.onStateChange = options.onStateChange || (() => {});
  }

  async execute(fn) {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure >= this.resetTimeout) {
        this.transition('half-open');
      } else {
        throw new CircuitOpenError(
          `Circuit ${this.name} is open. ` +
          `Resets in ${this.timeUntilReset()}ms`
        );
      }
    }

    if (this.state === 'half-open') {
      // Only allow limited requests through
      if (this.halfOpenAttempts >= this.halfOpenMax) {
        throw new CircuitOpenError(
          `Circuit ${this.name} is half-open, max attempts reached`
        );
      }
      this.halfOpenAttempts++;
    }

    this.lastAttempt = Date.now();

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (err) {
      this.onFailure(err);
      throw err;
    }
  }

  onSuccess() {
    this.failures = 0;
    this.successes++;
    if (this.state === 'half-open') {
      this.transition('closed');
    }
  }

  onFailure(err) {
    this.failures++;
    this.lastFailure = Date.now();
    this.lastError = err;

    if (this.failures >= this.threshold) {
      this.transition('open');
    }
  }

  transition(newState) {
    const oldState = this.state;
    this.state = newState;

    if (newState === 'half-open') {
      this.halfOpenAttempts = 0;
    }

    this.onStateChange({
      name: this.name,
      from: oldState,
      to: newState,
      failures: this.failures,
      lastError: this.lastError
    });
  }

  timeUntilReset() {
    if (this.state !== 'open') return 0;
    return Math.max(0,
      this.resetTimeout - (Date.now() - this.lastFailure)
    );
  }

  getStatus() {
    return {
      name: this.name,
      state: this.state,
      failures: this.failures,
      successes: this.successes,
      lastFailure: this.lastFailure,
      timeUntilReset: this.timeUntilReset()
    };
  }
}

class CircuitOpenError extends Error {
  constructor(message) {
    super(message);
    this.name = 'CircuitOpenError';
    this.isCircuitOpen = true;
  }
}

Each slot gets its own circuit breaker for each external dependency:

class SlotDependencies {
  constructor(slotId) {
    this.slotId = slotId;

    this.proxy = new CircuitBreaker(`${slotId}:proxy`, {
      threshold: 3,
      resetTimeout: 120_000,  // 2 minutes
      onStateChange: (e) => this.logStateChange(e)
    });

    this.ai = new CircuitBreaker(`${slotId}:ai`, {
      threshold: 5,
      resetTimeout: 60_000,   // 1 minute
      onStateChange: (e) => this.logStateChange(e)
    });

    this.api = new CircuitBreaker(`${slotId}:api`, {
      threshold: 3,
      resetTimeout: 300_000,  // 5 minutes (rate limits are longer)
      onStateChange: (e) => this.logStateChange(e)
    });
  }

  logStateChange(event) {
    const db = getDb(this.slotId);
    db.prepare(`
      INSERT INTO audit_log (id, module, action, status, detail, timestamp)
      VALUES (?, 'system', 'circuit_breaker', ?, ?, datetime('now'))
    `).run(
      crypto.randomUUID(),
      event.to === 'open' ? 'warning' : 'info',
      `${event.name}: ${event.from} β†’ ${event.to} (${event.failures} failures)`
    );
  }
}

When Slot A's proxy circuit opens, Slot A stops sending requests through that proxy. Slots B through Z continue normally β€” they have their own circuit breakers with their own state.

async function executeModuleAction(slotId, module) {
  const deps = getSlotDependencies(slotId);

  // Step 1: Find a tweet to reply to (uses proxy)
  let tweet;
  try {
    tweet = await deps.proxy.execute(() =>
      searchTweets(slotId, module.config.query)
    );
  } catch (err) {
    if (err.isCircuitOpen) {
      logAudit(slotId, module.name, 'skipped',
        `Proxy circuit open, resets in ${deps.proxy.timeUntilReset()}ms`);
      return;
    }
    throw err;
  }

  // Step 2: Generate AI reply (uses AI endpoint)
  let reply;
  try {
    reply = await deps.ai.execute(() =>
      generateReply(slotId, tweet, module.config.persona)
    );
  } catch (err) {
    if (err.isCircuitOpen) {
      logAudit(slotId, module.name, 'skipped',
        `AI circuit open, resets in ${deps.ai.timeUntilReset()}ms`);
      return;
    }
    throw err;
  }

  // Step 3: Send the reply (uses proxy + API)
  try {
    await deps.proxy.execute(() =>
      deps.api.execute(() =>
        sendReply(slotId, tweet.id, reply)
      )
    );
  } catch (err) {
    if (err.isCircuitOpen) {
      logAudit(slotId, module.name, 'skipped',
        `Circuit open: ${err.message}`);
      return;
    }
    throw err;
  }

  logAudit(slotId, module.name, 'success', reply);
}

Each step of the action is wrapped in its own circuit breaker. If the AI is down but the proxy is fine, the system skips AI-dependent modules but can still run non-AI modules (scheduled posts, reposts).

The dashboard shows circuit breaker state for each slot:

function getSystemHealth() {
  const slots = getAllActiveSlots();

  return slots.map(slot => {
    const deps = getSlotDependencies(slot.id);
    return {
      slotId: slot.id,
      proxy: deps.proxy.getStatus(),
      ai: deps.ai.getStatus(),
      api: deps.api.getStatus(),
      healthy: ['proxy', 'ai', 'api']
        .every(dep => deps[dep].state === 'closed')
    };
  });
}

An operator sees at a glance which slots are healthy, which have open circuits, and when each circuit will attempt recovery.

Default thresholds aren't universal. We tuned ours based on failure patterns:

Dependency Threshold Reset timeout Why
Proxy 3 failures 2 min Proxy failures are usually transient. Quick retry.
AI model 5 failures 1 min AI endpoints recover fast. Higher threshold to absorb occasional 503s.
X API 3 failures 5 min Rate limits last 15 min. Longer reset avoids hammering.

The key insight: reset timeout should match the expected recovery time of the dependency, not an arbitrary number.

1. One circuit breaker per dependency per tenant. Global circuit breakers cause healthy tenants to suffer for unhealthy ones. Per-tenant isolation is the whole point.

2. Log state transitions. When a circuit opens, the audit log records it. This is the most valuable diagnostic information during incidents.

3. Graceful skip > hard failure. When a circuit is open, the action is skipped and logged β€” not retried, not errored, not queued. The scheduler moves to the next action. Queuing failures leads to thundering herds when the circuit closes.

4. Nested circuit breakers work. An action that uses proxy + API goes through both breakers. If either is open, the action is skipped. This handles compound failures cleanly.

5. Half-open state prevents oscillation. Without half-open, a circuit that closes immediately sends a burst of requests that may re-trigger the failure. Half-open allows exactly one test request, preventing the open/close/open oscillation.

HelperX uses per-slot circuit breakers to keep your accounts running independently β€” one bad proxy doesn't affect the rest. Free 30-day trial.

── more in #ai-infrastructure 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/graceful-degradation…] indexed:0 read:5min 2026-06-12 Β· β€”