cd /news/ai-tools/three-sleep-intervals-for-three-apis… · home topics ai-tools article
[ARTICLE · art-23628] src=dev.to pub= topic=ai-tools verified=true sentiment=· neutral

Three sleep intervals for three APIs: Steam 250ms, GitHub 100ms, HuggingFace none

A developer built ETL pipelines for three directory sites in April, encountering different rate limits for the Steam, GitHub, and HuggingFace APIs. The Steam pipeline uses a 250ms sleep interval despite a documented safe rate of 1.5 seconds per request, accepting occasional HTTP 429 errors as non-fatal for review data. The GitHub pipeline relies on an authenticated token for 5,000 requests per hour, while the HuggingFace model registry API requires no sleep interval even for rapid-fire requests.

read5 min publishedJun 6, 2026

When I built the ETL pipelines for three programmatic directory sites in April — Top AI Tools (HuggingFace data), Find Games Like (Steam data), and Open Alternative To (GitHub data) — I had to figure out rate limits for three completely different APIs in the same week. The numbers, the failure modes, and the right way to handle errors are all different.

Here's what I actually shipped and the reasoning behind each number.

Steam's developer docs are sparse on hard rate-limit specifics. What I found from community discussion and trial: roughly 200 requests per 5 minutes per IP on the public Web API, which works out to one request per 1.5 seconds as a documented-safe interval. My code comments this openly:

await sleep(250); // Steam rate limit: ~200/5min, 1.5s is safe; 250ms is aggressive but usually fine

I chose 250ms anyway because the ETL runs as a nightly GitHub Actions job over ~60 game entries. At 250ms that's 15 seconds of sleep total. At 1.5 seconds it would be 90 seconds. The gap matters when the cron has three sites to process.

The acceptable risk: Steam doesn't hard-ban on the first rate-limit violation, it returns HTTP 429 and the job logs the error. The games ETL treats review-endpoint failures as non-fatal — the game row is still written; only the review stats are absent until the next run:

try {
  const r = await getAppReviewSummary(appid);
  // ... write to DB
} catch (err) {
  reviewsFailed++;
  console.error(`! Review fetch failed for appid ${appid}:`, err);
}

The reviewsFailed

counter appears in the job log. If I see it climbing consistently, that's the signal to increase the sleep interval. So far I haven't needed to.

GitHub's REST API is explicit about limits: 60 requests per hour unauthenticated, 5,000 per hour with a personal access token. The GitHub docs on rate limiting cover both the primary limit and the secondary limits for specific endpoint categories. The OSS alternatives ETL makes one GET /repos/:owner/:repo

call per alternative project — roughly 3–5 repos per SaaS tool in the seed data. Even a large seed run of 50 tools with 5 alternatives each is only 250 requests.

The sleep is there as a politeness interval, but authentication is doing the real rate-limit work:

function authHeaders(): Record<string, string> {
  const token = process.env.GITHUB_TOKEN;
  const base: Record<string, string> = {
    Accept: "application/vnd.github+json",
    "X-GitHub-Api-Version": "2022-11-28",
  };
  if (token) base.Authorization = `Bearer ${token}`;
  return base;
}

GITHUB_TOKEN

is set in GitHub Actions from a repository secret. Without it, 60 requests per hour would exhaust in under a minute for a full seed run. With it, the 5,000/hour ceiling gives comfortable headroom.

One subtlety: there are two separate GitHub rate limits — the core REST API limit (5,000/hour authenticated) and the search API limit (30 requests per minute unauthenticated, 10 per second authenticated). The current ETL uses GET /repos/:owner/:repo

directly, not search, so the looser core limit applies. If I ever switch to search-based discovery the math changes.

The model registry API — listing models, fetching model metadata — has no hard documented rate limit that I've hit in weeks of nightly runs. The ETL fetches up to 100 models in one GET /api/models?limit=100&sort=downloads

call, then one detailed fetch per model. 100 rapid-fire requests, no sleep, no 429s.

Part of this is the HUGGINGFACE_TOKEN

header in authenticated requests, which raises whatever ceiling exists. Part of it is that the registry API is explicitly designed for automated tooling at batch scale — it's the primary way model cards, metadata scrapers, and leaderboard tools consume the catalog.

function authHeaders(): Record<string, string> {
  const token = process.env.HUGGINGFACE_TOKEN;
  return token ? { Authorization: `Bearer ${token}` } : {};
}

If I scale to 1,000 models per nightly fetch I'd add a 50ms sleep as a precaution. For 100, the simplest thing that works is also the correct thing.

API Sleep Auth impact Failure mode Fatal?
Steam appdetails 250ms None (public) 429, occasional Non-fatal
Steam reviews 250ms (shared) None (public) 429, more frequent Non-fatal
GitHub REST 100ms 60→5,000/hr 403, clear message Non-fatal
HuggingFace registry None Raises ceiling Rare 429 Non-fatal

All four code paths are non-fatal. A 429 or connection error anywhere in the batch writes a fallback-template row to Turso and increments a counter. The content upgrade loop picks up any gaps the next night.

The sleep interval is a guess. What actually protects the ETL from being useless after a rate-limit event is that failures are cheap. Every external API call in this stack is wrapped in a try/catch that writes degraded content rather than crashing the batch. The sleep interval controls how likely you are to hit a rate limit; the fallback chain controls what happens when you do.

For indie-scale ETL — tens to hundreds of entries per night — the combination of a conservative-ish sleep and a non-fatal error path is enough. If the site grows to thousands of entries per run, I'd rethink both: moving to a queue-bounded concurrent fetcher with exponential backoff, and separating the content generation from the data fetch into stages that can be retried independently.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

── more in #ai-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/three-sleep-interval…] indexed:0 read:5min 2026-06-06 ·