How I built a three-tier content quality ladder for programmatic directory ETL

wpnews.pro

The three directory sites I launched in April — Top AI Tools, Find Games Like, and Open Alternative To — all generate editorial content the same way: fetch metadata from an external API, send it through Claude Haiku 4.5, write the result to Turso. But that description skips the part that actually matters for a programmatic site at scale: what happens when Claude can't run.

The answer is a content quality ladder with three tiers, tracked by a single model_used

column.

Every content table across all three sites has a model_used

column. It takes one of three values:

Value	Origin	Quality
`seeded-from-json`
Loaded from a curated JSON file at bootstrap	Minimal — structured but thin
`fallback-template`
Claude unavailable or API key absent	Acceptable — technically correct, not editorial
`claude-haiku-4-5`
Generated by Claude Haiku 4.5	Target — editorial summaries, named examples, nuanced caveats

Seeded content exists because each site ships with a JSON file of curated entries. Those entries have names, descriptions, and metadata from their upstream source (HuggingFace, Steam, GitHub), but no editorial layer yet. The page renders — but it reads like a database dump, not a directory.

Fallback-template content is what you get when the API key isn't present or when a Claude call fails. For the AI tools site, the fallback for a model named qwen2-7b

in the text-generation

pipeline looks like this:

qwen2-7b is an open-source text-generation model available on HuggingFace.
Details are sourced from the public model registry.

That's not wrong. It just doesn't help anyone decide whether to use the model.

Claude Haiku content is the target state. A good generation for the same model says something like: "Qwen2-7B is a 7-billion parameter instruction-tuned model from Alibaba Cloud optimized for multilingual generation, showing strong performance on Chinese and English benchmarks while fitting in 16GB of VRAM." The difference is editorial voice and specificity — neither of which template-filling can produce.

The ETL generation step doesn't blindly regenerate everything on each run. It targets only entries that need work:

SELECT m.id, m.name, m.pipeline_tag, m.tags
FROM models m
LEFT JOIN model_content c ON c.model_id = m.id
WHERE c.model_id IS NULL
   OR c.model_used IN ('fallback-template', 'seeded-from-json')
ORDER BY m.downloads DESC
LIMIT ?

Three things happen simultaneously here:

LEFT JOIN ... WHERE c.model_id IS NULL

catches brand-new entries added by the nightly fetch that have no content row yet.OR c.model_used IN ('fallback-template', 'seeded-from-json')

catches existing rows that were written with lower-quality content.ORDER BY m.downloads DESC

means when the LIMIT is hit, the most-downloaded (most-visited) entries are upgraded first.This identical query pattern appears in all three sites with different table names: models

/model_content

for AI tools, games

/game_content

for indie games, saas

/saas_content

for OSS alternatives. The abstraction was a late realization — I wrote it three times before noticing it was the same thing. A shared buildUpgradeQuery(tableName, pkField, contentTable)

helper would have been the right call from the start.

Inside the generation loop, every entry goes through the same decision tree:

const hasApiKey = !!process.env.ANTHROPIC_API_KEY;

if (hasApiKey) {
  try {
    const result = await generate({
      systemPrompt: SYSTEM_PROMPT,
      userPrompt,
      cacheSystem: true,
      maxTokens: 1024,
    });
    content = parseOrFallback(result.text, fb);
    modelUsed = "claude-haiku-4-5";
    generated++;
  } catch (err) {
    console.error(`! Claude error for ${id}:`, err instanceof Error ? err.message : err);
    content = fb;
    fallback++;
  }
} else {
  content = fb;
  fallback++;
}

The cacheSystem: true

flag marks the system prompt block with cache_control: { type: "ephemeral" }

. All three sites have fixed system prompts — the same AI tools instruction across every model generation, the same game critic instruction across every game — so the first call in a batch primes the cache and the remaining ~99 calls read it at the reduced input rate. I covered the mechanics in the article on the shared Haiku client. With a ~900-token system prompt and 100 entries per run, the cache saves roughly 90,000 input tokens per nightly run. Anthropic's prompt caching documentation has the exact pricing for cache creation vs cache read tokens.

The error path is deliberately non-throwing. Any Claude failure — rate limit, network timeout, malformed response — drops through to content = fb

and increments fallback

. The run continues. If 10 of 100 Claude calls fail due to transient rate limits, 90 get written with claude-haiku-4-5

and the 10 failures get fallback-template

. Those 10 rows surface in the next night's upgrade query automatically.

Every content row is written with INSERT ... ON CONFLICT ... DO UPDATE SET

:

INSERT INTO game_content
  (appid, summary, similar_games, good_for, avoid_if, generated_at, model_used)
VALUES (?, ?, ?, ?, ?, ?, ?)
ON CONFLICT(appid) DO UPDATE SET
  summary = excluded.summary,
  similar_games = excluded.similar_games,
  good_for = excluded.good_for,
  avoid_if = excluded.avoid_if,
  generated_at = excluded.generated_at,
  model_used = excluded.model_used

The upsert makes the ETL fully idempotent: running it twice produces the same state as running it once. More importantly, it means the model_used

column gets overwritten when an upgrade succeeds. A row that was fallback-template

becomes claude-haiku-4-5

in-place, without any explicit "mark upgraded" step. The column just reflects what actually produced the current content.

The compare-page ETL uses a different pattern: check-before-insert with an explicit SELECT 1

to skip already-generated pairs. Both patterns are valid. Check-before-insert is better when reprocessing is expensive (large Claude calls, multi-step generation). Upsert-overwrite is better when you always want the latest generation to win regardless of what was there before.

One consequence of shipping a three-tier system is that some pages launch with genuinely thin content. For the indie games site, the threshold is explicit in the game page component:

const noindex =
  game.good_for.length === 0 &&
  game.avoid_if.length === 0 &&
  game.similar_games.length === 0;

If a game entry has no good_for

audience signals, no avoid_if

caveats, and no similar game suggestions — which happens when the content row is missing entirely, not just fallback-template — the page gets noindex

in its robots meta. The page renders fine for direct visitors; it just isn't submitted to Search Console until content exists.

In practice, the fallback templates do populate good_for

and avoid_if

with generic strings like "Indie game enthusiasts" and "You prefer AAA production values," so most fallback-template entries still pass the noindex check. The valve fires mainly on completely-missing rows, which are brief windows between when the fetch ETL adds a new game and when the generation ETL runs next.

After generation, a separate export.ts

script dumps the content tables to static JSON files that Astro reads at build time. This is the architectural detail that makes the quality ladder safe to run asynchronously.

If the Anthropic API is down for an entire nightly run, the export runs with whatever's in the DB, the Astro build succeeds with existing content, and the deployed site doesn't have zero-content pages. The upgrade queue just has a larger backlog the following night.

The static SSG approach I'm running across all three sites is partly justified by this property. Dynamic rendering from a live DB would mean a Claude outage or Turso blip directly impacts page load time for real users. The ETL → export → build pipeline adds ~24 hours of content staleness in exchange for availability that doesn't depend on the API being up at request time. For a directory site where model descriptions change rarely, that tradeoff is easy to accept.

The generation loop is strictly sequential. One call, await, write to DB, next entry. For 100 entries at roughly 1–1.5 seconds per call that's about 2 minutes per run — fine for the current scale.

At 1,000 entries it would be 20+ minutes, which starts blocking the rest of the GitHub Actions job. The fix is a semaphore-bounded batch:

import PQueue from "p-queue";
const queue = new PQueue({ concurrency: 5 });

const tasks = pending.rows.map((row) =>
  queue.add(() => generateAndWrite(row))
);
await Promise.all(tasks);

Five concurrent workers would bring a 1,000-entry run down to under 5 minutes without risking the Anthropic rate limit. I've kept the sequential version because it's simpler to debug and the current batch sizes don't need it, but I'll add the queue before growing any site past ~300 entries.

I also wish I'd started with better fallback copy. The initial seed templates are technically correct but thin, and some of that thin content shipped live to indexable pages before the ETL had a chance to upgrade it. A cleaner v1 strategy: run the full ETL before the first Astro build so every page that ships has at least a real Claude generation. The seeded-from-json tier exists because I moved too fast at launch; it's not architecturally necessary.

Can I run the ETL without an API key during local development?

Yes. The hasApiKey

check means every generation falls through to fallback-template

. All DB writes still happen, the export still runs, and the Astro build succeeds. Once you add a real key, the next ETL run upgrades all fallback-template

rows automatically without any manual intervention.

How do I check the current upgrade ratio?

SELECT model_used, COUNT(*) as cnt
FROM game_content
GROUP BY model_used;

A healthy site a week after launch should have mostly claude-haiku-4-5

rows with fallback-template

count trending toward zero. The generated_at

timestamp on each row also lets you see how recently content was last upgraded.

What happens when Claude returns malformed JSON?

Each site's parseOrFallback()

function extracts the outermost {...}

block with a regex before parsing — this handles the common case where Haiku prepends an explanation like "Here is the entry:" before the actual JSON. All field accesses after the parse are null-safe and fall back to the fallback struct individually if a field is wrong type or missing. The row still gets written; model_used

records whichever tier actually filled the content.

Does the cache persist between separate nightly runs?

No. Anthropic's ephemeral cache TTL is 5 minutes. Within a single run of 100 entries, the 99 calls after the first hit the cache. Across runs scheduled hours apart, the cache has expired and the first call re-primes it. The savings are per-batch, not cross-run — still meaningful for batches of 100, but not a persistent cost reduction over time.

Why Turso for this instead of Postgres?

I covered the comparison in detail in the Turso vs Cloudflare D1 article. The short version for this use case: @libsql/client

works identically in Node.js ETL scripts and at Astro serverless/edge, with no separate driver or connection-pooling setup for each environment. For a project where the same getClient()

call needs to work in GitHub Actions jobs and Vercel edge functions, that's the practical reason to use it.

Part of an ongoing 6-month experiment running three AI-curated directory sites. The technical claims here are real; this article was AI-assisted.

source & further reading

dev.to — original article Perché l’“harness” batte il modello: lezioni pratiche dai sistemi multi‑agente Substack's New AI Detector Has the Same Blind Spot DEV.to's Did Cisco's Antares Models Cost $1 to Find Security Bugs. GPT-5.5 Costs $141

How I built a three-tier content quality ladder for programmatic directory ETL

Run your AI side-project on zahid.host