# Building domain-specific AI chatbots with Claude

> Source: <https://gregwilson.tech/building-domain-specific-ai-chatbots>
> Published: 2026-06-26 03:19:33+00:00

I keep building the same thing: a chatbot that’s a genuine expert in exactly one niche and politely useless outside it. Ask it anything in its lane and it’s sharp, opinionated, and sourced. Ask it to write you a Python script or summarize the news and it declines and steers you back.

There are four of them now:

— jazz: musicians, sessions, sidemen, labels, discographies.[JazzQuery](https://jazzquery.com)— Porsche: models, generations, specs, motorsport, the people.[RennQuery](https://rennquery.com)— watches: brands, references, movements, complications, collecting.[WristQuery](https://wristquery.com)— a chat companion for living with celiac disease, grounded in federal medical references[Celiac vs Me](https://celiacvsme.com)*and*my own book of the same name.

They look like four different products. Under the hood they’re the same machine with the paint swapped. This post is that machine: the web architecture, the two-model trick that keeps the bills small, the curated-source approach that keeps the answers honest, and the per-site knowledge bases that make a general-purpose model behave like a specialist — including a local mirror of a chunk of Wikipedia, a 33-million-row jazz discography, and a full book manuscript loaded into a search index.

As always, these are evening-and-weekend projects squeezed around a day job — which is only realistic because I don’t write the code by hand anymore. I do the architecture, the judgment calls, and the data curation; [Claude Code](https://www.anthropic.com/claude-code) does the building. Running *four* of these without it becoming a second job — or, so far, a real expense — is the whole point of the design. (And if you want to spin up your own, there’s a [build kit](/domain-chatbot-build-kit.md) at the end.)

## The four sites at a glance

| Site | The hard part |
|---|---|
JazzQueryjazz | ”Who played drums on this session, and did these three ever record together?” — answered from a local copy of the Discogs catalog, not the model’s memory. |
RennQueryPorsche | Decades of generations and engine codes where a confident-but-wrong answer is worse than no answer. Grounded in a local Wikipedia mirror. |
WristQuerywatches | Reference numbers, caliber specs, and a market where “current value” genuinely needs a live web search — but most questions don’t. |
Celiac vs Meceliac disease | Medical accuracy plus lived experience. The model leans on public-domain federal guidance for the facts and my book for the human side. |

The differences are all in the **knowledge** layer. Everything around it — the web app, the model routing, the guardrails, the cost controls — is shared.

## One framework, four facades

Every site is the same stack:

| Layer | Why |
|---|---|
Client — React 19 + Vite (SPA) | A real-time chat UI with streaming, citations, and feedback isn’t a static page. Unlike
|

**Server**— a serverless backend that can stream`main`

auto-deploys.**Session**— a per-session stateful object** Analytics + knowledge**— SQLite, with FTS5 full-text search** Chat plumbing**— a WebSocket router + the[Vercel AI SDK](https://sdk.vercel.ai/)`streamText`

handles the model call, tool loop, and streaming.The request flow for a single question:

```
Browser  ──WebSocket──▶  Server ──▶  Session object (one per browser)
                                         │
                                         ├─ 1. cached answer?  (zero-cost fast path)
                                         ├─ 2. rate-limited?   (30/hour per session)
                                         ├─ 3. Haiku gate      (on-topic? needs the web?)
                                         └─ 4. Sonnet + tools  (search_wiki / discogs / web)
                                         │
Browser  ◀──streamed text + citations──┘
```

The heart of it is one `streamText`

call. Simplified, but this is really the shape:

``` js
const result = streamText({
  model: anthropic(MAIN_MODEL),          // claude-sonnet-4-6
  messages: [CACHED_SYSTEM, ...history],
  tools: {
    search_wiki: searchWikiTool(db),   // free, always on
    // attach web_search ONLY when the gate flagged the turn FRESH:
    ...(gate.needsSearch
      ? { web_search: anthropic.tools.webSearch_20260209({
            maxUses: 1, allowedDomains,
          }) }
      : {}),
  },
  stopWhen: stepCountIs(6),              // bound the tool/reasoning loop
  onFinish: logTurnUsageAndCost,
});
```

Responses stream back token-by-token over the socket as Server-Sent-Event-style chunks, with source citations emitted as their own `source-url`

parts so the UI can render them as little pills under each answer. The client paces the reveal at a steady ~220 characters/second so it reads like typing instead of bursting in jerky clumps.

A nice side effect of the shared design: when I learn something on one site, all four get it. The “never narrate a lookup” rule I’ll describe below started as a JazzQuery bug fix and is now in every system prompt.

## Two models: a cheap bouncer and a smart expert

The single most important design decision is that **two different Claude models** handle every question, and the cheap one runs first.

is the bouncer. It runs on[Claude Haiku 4.5](https://www.anthropic.com/claude)*every*message and decides, in one word, whether the question is even in scope.is the expert. It only runs once Haiku waves the question through, and it’s the one with the tools, the long system prompt, and the real cost.[Claude Sonnet 4.6](https://www.anthropic.com/claude)

Why bother? Cost asymmetry. A full answer costs me about two cents — and as much as fifteen once it’s run a web search or two. A Haiku gate call is a fraction of a cent. Rejecting an off-topic question (or absorbing someone trying to use my jazz site as a free coding assistant) is **tens to hundreds of times cheaper** than answering it. The bouncer makes abuse boring and cheap instead of expensive.

The first three sites use a simple `YES`

/ `NO`

/ `MAYBE`

topic gate. WristQuery has the most evolved version, where a single Haiku call does double duty — it gates *and* decides whether the answer needs the live web:

``` js
// Haiku returns exactly one word: OFF, KNOWN, or FRESH.
const decision = /FRESH/.test(up) ? "FRESH" : /KNOWN/.test(up) ? "KNOWN" : "OFF";
// First turn must be KNOWN/FRESH; short follow-ups inherit the topic.
const ok = isFirstUserMessage
  ? decision === "KNOWN" || decision === "FRESH"
  : decision !== "OFF";
return { ok, needsSearch: decision === "FRESH" };
```

`OFF`

→ refuse (not about watches, or it’s a “write me code” request in disguise).`KNOWN`

→ answer from the model’s own training.**No web search tool is even attached.**`FRESH`

→ needs current data (a price, availability, this year’s release), so the web search tool gets bolted on for that turn only.

That `needsSearch`

flag is why the `web_search`

tool was conditionally spread into the `tools`

object in the snippet above. Most watch questions are `KNOWN`

— “what’s the difference between a 116610 and a 126610 Submariner” doesn’t need the internet — so most turns never pay for a search at all.

Sonnet’s system prompt is where each site gets its personality and its limits. They all share the same skeleton — **Scope**, **Voice**, **Style**, **Tool use** — and one hard-won rule that I’ll quote because it cost Claude Code and me a real debugging session to land on:

Be decisive, and NEVER narrate a lookup you are about to do… a turn that ends on a promise (text with no tool call) stops generation and strands the user with no answer.

Early on, the model would sometimes end a message with “Let me look that up for you…” and then just… stop. The tool call never came, because the turn was over. To the user it looked like the bot froze mid-sentence. The fix lives in the prompt: tool calls are invisible, so never announce them, and every turn must *end* on a real answer, never on a plan to continue.

A couple of other knobs keep Sonnet in line and cheap:

- The whole system prompt is sent with
`cacheControl: { type: "ephemeral" }`

, so Anthropic’s[prompt caching](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)bills the repeated prefix at ~10% of the normal input rate. `stepCountIs(6)`

caps the tool-use loop so a confused turn can’t spiral into a dozen searches.- All calls go to the Anthropic API directly via the AI SDK.

## Curating the web instead of trusting all of it

When one of these bots *does* search the web, it can’t just search *the web*. Open web search is a hallucination delivery mechanism: SEO spam, content farms, and confidently wrong forum posts all rank, and the model happily cites them.

So each site ships a hard **allowlist** of domains it’s allowed to search. WristQuery’s, trimmed:

``` js
const SEARCH_ALLOWED_DOMAINS = [
  // Reference & specs
  "wikipedia.org", "watchbase.com", "calibercorner.com", "ranfft.org",
  // Brand-official
  "rolex.com", "omegawatches.com", "patek.com", "audemarspiguet.com",
  // Editorial & journalism
  "hodinkee.com", "fratellowatches.com", "monochrome-watches.com", "watchtime.com",
  // Community & forums
  "watchuseek.com", "rolexforums.com", "reddit.com",
  // Market, values & auctions
  "chrono24.com", "watchcharts.com", "phillips.com", "christies.com", "sothebys.com",
];
```

RennQuery’s list is Porsche.com, Rennlist, Total 911, Bring a Trailer, Hagerty, and the automotive press. JazzQuery’s is AllMusic, JazzDisco, the Smithsonian, Blue Note, ECM, DownBeat, and NPR. Celiac vs Me’s is Mayo Clinic, Cleveland Clinic, NIDDK, celiac.org, Beyond Celiac, and the Gluten Free Watchdog. The list *is* the editorial judgment.

The allowlist is handed straight to the search tool, which filters results before they ever reach the model:

```
anthropic.tools.webSearch_20260209({ maxUses: 1, allowedDomains: SEARCH_ALLOWED_DOMAINS })
```

There’s one production gotcha worth sharing. Some of those domains block Anthropic’s search crawler, and when that happens the whole request fails with a hard `400`

rather than just skipping the site. So there’s a small retry loop: catch the “these domains are blocked” error, drop the offenders, remember them in the session object’s SQLite so the next turn doesn’t repeat the mistake, and retry — invisibly to the user. If every domain ends up blocked, it falls back to an unrestricted search rather than breaking.

## Mirroring Wikipedia locally to skip the toll

Here’s the realization that shaped RennQuery and WristQuery: for a domain like Porsche history or horology, **most of the good factual content lives in a few hundred Wikipedia articles that almost never change.** Paying for a metered web search every time someone asks about the 997.2’s engine, when the answer is sitting in the same Wikipedia article it was in last month, is silly.

So I have Claude Code copy those articles into a local table for me. The build script it wrote (`fetch-wiki-corpus.mjs`

) walks a curated set of seed titles and category trees — skipping noise like clocks and smartwatches — pulls the article text, chunks it, and writes a SQL file that gets loaded into the database. The schema is a [SQLite FTS5](https://www.sqlite.org/fts5.html) full-text index:

```
CREATE TABLE wiki_articles (page_id INTEGER PRIMARY KEY, title TEXT, url TEXT, fetched_at INTEGER);

CREATE VIRTUAL TABLE wiki_fts USING fts5(
  title, section, content, url UNINDEXED,
  tokenize = 'unicode61 remove_diacritics 2'
);
```

The model gets a `search_wiki`

tool backed by it. FTS5’s `MATCH`

syntax chokes on punctuation like `997.1`

, so the query is reduced to bare word tokens, OR-ed together, and ranked by [BM25](https://en.wikipedia.org/wiki/Okapi_BM25):

```
const { results } = await db
  .prepare(`SELECT title, section, url, content, rank
            FROM wiki_fts WHERE wiki_fts MATCH ? ORDER BY rank LIMIT ?`)
  .bind(buildFtsQuery(query), WIKI_RESULT_LIMIT)   // e.g. "997" OR "1" OR "turbo"
  .all();
```

The economics are the whole point. A `search_wiki`

hit costs only the tokens it returns — **no per-search fee and no 10,000-token payload of raw web results**. So the system prompt steers the model here *first* for anything historical or spec-related, and reserves the paid web search for things that genuinely change: prices, availability, this year’s news.

If copying a slice of Wikipedia onto my own server sounds legally sketchy — it isn’t. Wikipedia’s text is published under the [Creative Commons Attribution-ShareAlike license](https://en.wikipedia.org/wiki/Wikipedia:Reusing_Wikipedia_content) (CC BY-SA), which explicitly permits reuse (commercial use included) on two conditions: **attribution** (credit Wikipedia and link back to the article) and **share-alike** (if you redistribute an adapted version, you license it under the same terms). The attribution is built in: every chunk the model retrieves carries its source URL and gets cited in the answer, exactly like a web-search result.

(One subtlety: because every returned chunk becomes a citation, a weak query that only matched common words like “the” or “911” would attach irrelevant articles to the answer. So results scoring worse than half the top hit’s BM25 rank get dropped before they become citations.)

## JazzQuery’s 33-million-row discography

Wikipedia mirroring works when the knowledge is prose. Jazz isn’t prose — it’s a *database*. “Who played tenor on this date,” “did McCoy Tyner and Elvin Jones ever record outside Coltrane’s group,” “what’s the personnel on this album” are structured queries, and asking a language model to answer them from memory is how you get confident, beautifully worded, completely invented sidemen.

So JazzQuery doesn’t ask the model to remember. It ships a copy of the [Discogs](https://www.discogs.com/) catalog.

Discogs publishes a monthly data dump of its entire database. An ETL pipeline filters it down to jazz, normalizes it, and loads it into a read-only SQLite database — **about 33 million rows, ~1.6 GB**: artists, their aliases, labels, masters (the canonical “work”), releases (specific pressings), per-track listings, and the payload table, `credits`

— the “who played what on which release” edge list.

``` php
CREATE TABLE credits (
  release_id INTEGER NOT NULL,   -- -> releases.id
  artist_id  INTEGER NOT NULL,   -- -> artists.id
  role       TEXT NOT NULL,      -- raw: 'Saxophone [Tenor]'
  instrument TEXT,               -- normalized: 'saxophone'
  track_pos  TEXT                -- per-track scope, or NULL = whole release
);
```

“Jazz” turned out to be the interesting filtering problem. A genre tag alone misses too much, so inclusion happens in three layers, recorded in each master’s `inclusion`

column (`'genre' | 'style' | 'artist'`

): records tagged Jazz; records in adjacent jazz *styles*; and — the fun one — **records that aren’t jazz at all but credit a jazz musician as a player.** That last layer is why JazzQuery knows Steve Gadd’s session work shows up on Paul Simon and Steely Dan records, not just on jazz albums.

Each artist also carries a `jazz_score`

(how many jazz credits they have), which both drives that expansion and disambiguates names. Ask about “Steve Gadd” and you get the drummer, not the score-zero namesake.

The model reaches the database through a `discogs_lookup`

tool with five query shapes — `artist`

, `discography`

, `personnel`

, `album`

, and `collaboration`

. The collaboration one is my favorite, because it answers “did all of these people record together” as a single set intersection instead of making the model cross-reference three separate discographies in its head:

```
// Did N artists all appear on the same release? One credit intersection.
const where = ids.map(() =>
  "r.id IN (SELECT release_id FROM credits WHERE artist_id = ?)"
).join(" AND ");
const shared = await rows(db,
  `SELECT DISTINCT m.title, m.year FROM masters m
   JOIN releases r ON r.master_id = m.id
   WHERE ${where} ORDER BY m.year LIMIT 40`, ...ids);
```

Every result links back to the real Discogs page it came from, so the citations are verifiable. And the system prompt is strict: for personnel and “did they record together” questions, the model **must** call the tool and answer from it — never from memory.

Two production details I’m a little proud of — both of which took some back-and-forth with Claude Code to get right:

**Read replicas.** A 1.6 GB database has real query latency if every read crosses regions, so the queries run against an in-region read replica with relaxed consistency — safe, because this data only changes once a month.**Predictive warm-up.** The managed database evicts an idle instance, and the cold-start can be 15–20 seconds — brutal in a chat. Claude Code’s first instinct was a cron job pinging the database every few minutes to keep it warm; that works, but it felt kludgy and wasteful to me — paying to poke a database nobody’s using. The idea I landed on instead: when you*start typing*, the client quietly fires a request that wakes the replica and warms its caches. By the time you hit send, the database is hot, and the cold start has hidden itself behind your typing.

(Loading 33 million rows into a managed SQLite store had its own war stories — Claude Code and I learned the hard way that building an index on a multi-million-row table runs it out of memory, so the loader creates every index on the empty table *first*, then bulk-inserts. But that’s a post of its own.)

## Celiac vs Me: putting my own book in the model’s hands

Two years ago, I was diagnosed with celiac disease and had to learn to navigate a world where I can’t consume even the tiniest amount of gluten. I wrote a book about what I learned and how the transition felt — so for this one, I wanted to bring that knowledge and experience into the chat’s answers.

The celiac companion is the most personal of the four, and it has the most interesting knowledge problem, because the right answer depends on what *kind* of question it is.

“What’s the FDA definition of gluten-free?” is a settled fact with an authoritative source. “How do I get through my first Thanksgiving after diagnosis without feeling like a burden?” is lived experience. The first should never need a web search; the second isn’t in any medical reference at all. So the model has a three-tier sourcing order, and the system prompt tells it which to reach for:

— public-domain federal material: the NIDDK’s celiac pages and the FDA’s gluten-free labeling rule. Government works aren’t copyrighted, so I ingested them into an FTS5 table (32 chunks). A hit here is sub-penny and authoritative — far better than paying to web-search a fact that’s been settled since 2013.`search_reference`

— the full manuscript of my book,`search_book`

*Celiac vs Me: When Gluten Turns Your Food World Upside Down*, loaded into its own FTS5 index. This is where the lived-experience answers come from.**Web search**— only for things that actually change: recalls, current products, new research, restricted to the curated medical allowlist.

Getting the book into the model is a little pipeline of its own — another Claude Code build. The manuscript is a Word document; `ingest-book.mjs`

runs it through [mammoth](https://github.com/mwilliamson/mammoth.js) with a style map (so the book’s custom Word heading styles become real `h1`

/`h2`

markers), splits it into chapters and sections, and packs the paragraphs into ~1,400-character chunks, splitting oversized ones at sentence boundaries. The result is loaded into:

```
CREATE VIRTUAL TABLE book_chunks USING fts5(
  content, chapter UNINDEXED, section UNINDEXED, ord UNINDEXED
);
```

The `search_book`

tool runs a plain BM25 match and returns up to six passages, each labeled with its chapter:

``` js
const rows = await db.prepare(
  `SELECT chapter, section, content FROM book_chunks
   WHERE book_chunks MATCH ? ORDER BY bm25(book_chunks) LIMIT 6`
).bind(terms.join(" OR ")).all();
```

Because every passage knows its chapter, the model can say “the book digs into this in *The Table*” and the UI links the citation to that chapter’s excerpt page on the site. The chapters even have names that map to the parts of life celiac disease upends — *The Diagnosis*, *The Aisles*, *The Table*, *The Road*, *The Calendar*, *The Body Underneath*, *The Emotional Journey* — so the citations read like a table of contents.

Those excerpt pages are part of a bigger whole. celiacvsme.com isn’t only the chatbot — it’s also a static companion site to the book (an excerpt from every chapter, plus a plain-Markdown mirror of each page) and a gluten-free-living blog, built the same agent-friendly way as this site: structured data and an [ llms.txt](https://celiacvsme.com/llms.txt) index included. Claude Code built all of it from my manuscript, which saved me an enormous amount of time — and is why the chat’s citations land on real chapter pages instead of dead ends.

Two deliberate choices worth calling out:

**It’s keyword search, not embeddings.** No vector database, no embedding model, no similarity threshold to tune — just SQLite’s full-text index, which the database engine already provides for free. For a single book and a handful of reference pages, BM25 is plenty, and the whole thing is one`MATCH`

query. Reaching for a vector store here would have been resume-driven development.**The manuscript never touches the repo.** The ingest script writes a SQL file that’s gitignored and loaded out-of-band. The book’s text lives in the database the bot queries, not in source control.

One thing newly diagnosed people often miss is that *gluten-free* and *celiac-safe* aren’t the same thing. A product can carry a gluten-free label and still be handled or cross-contaminated in ways that make it unsafe for someone with celiac disease. That gap matters enough that I wrote a bit of code for it: when a question is about something *gluten-free*, the prompt gets quietly augmented to pull in *celiac-safe* too, so the answer draws the distinction instead of letting the reader assume the two are interchangeable.

And because it’s a medical topic, the system prompt carries a safety layer the other three don’t: it’s explicit that this isn’t medical advice, never tells anyone to start or stop a medication, and nudges people to a clinician when a question crosses into diagnosis or treatment.

## The boring machinery that keeps it cheap and honest

A few cross-cutting pieces show up in all four and are most of the reason I can afford to run them:

| Mechanism | What it does |
|---|---|
Pre-wired answer cache | The suggested starter questions on each home page are pre-computed and stored in the database. Clicking one streams a cached answer at zero model cost — no gate, no search, no Sonnet. It’s even paced to mimic live typing so you can’t tell. |
Staggered weekly refresh | A cron job regenerates those cached answers weekly. JazzQuery refreshes at 09:00 UTC Mondays, Celiac at 10:00 — deliberately an hour apart, because they share one Anthropic organization and the web-search tool is rate-limited org-wide. (A scaling lesson learned the slightly hard way.) |
History pruning | Before each turn, the bulky tool results and citations from previous turns are stripped out of the history sent to the model — saving 10,000–15,000 stale tokens on every follow-up. |
Rate limits | 30 questions/hour per session, plus a 60-requests-per-minute-per-IP cap, so no single visitor can run up a bill. |
Bot protection | A bot-challenge widget (a CAPTCHA-style check) mints a signed, HttpOnly session cookie; every chat request checks the signature before spending a cent on a model. |

None of it stores anything personal — the query and usage logs are anonymous and carry only a coarse, IP-derived city, never the IP itself.

## Paying attention to the cost

I built these for fun, and monetizing never crossed my mind. Then I had Claude Code log the cost of every query and tally it up — and that number got my attention.

Even with all the cost engineering above, none of it makes a query *free*. You saw the math a few sections back: a couple of cents for a typical answer, up to fifteen once it’s had to hit the web. That’s nothing when it’s me and a few friends kicking the tires.

It is emphatically *not* nothing if one of these ever catches on. Ten thousand questions in a month (not a big number, for anything with traction) at a nickel apiece is $500, for a site that earns exactly zero.

So for now I’ve done the honest hobbyist thing: set aside a little money, left the doors open, and let people play. No login, no paywall — just the rate limits above, so no single visitor can run up the tab. I also added a tip jar, a few Stripe payment links, though I hold approximately zero illusions about tips covering the bill. It’s a “if this saved you a search-engine rabbit hole, buy me a coffee” gesture, not a business model.

If any of them genuinely takes off, the honest answer is a subscription gate, and the email-gating plan in *What’s next* is the first brick in that wall: a handful of free questions to get someone hooked, then a sign-in before a curious visitor quietly turns into a line item. Everything earlier in this post — the cheap gate, the curated search, the local knowledge bases — exists to push that day as far out as possible. Good architecture buys runway, not immunity.

## Hosting this

Nothing above is really tied to one cloud. Strip the brand names and the app is a handful of generic capabilities: a serverless backend that can stream, a scrap of per-session state, a SQL database with full-text search, static hosting, a bot challenge, and a weekly cron. All three big providers can run every piece — here’s the mapping:

| Capability | AWS | Google Cloud | Cloudflare |
|---|---|---|---|
| Compute + streaming/WebSocket | API Gateway WebSocket + Lambda, or Fargate / App Runner | Cloud Run | Workers |
| Per-session state | DynamoDB, keyed by session | Firestore (or Memorystore) | Durable Objects |
| SQL + full-text search | Aurora Serverless v2 (Postgres `tsvector` ), or OpenSearch for BM25 | Cloud SQL / AlloyDB (Postgres) | D1 (SQLite + FTS5) |
| Static assets + CDN | S3 + CloudFront | Cloud Storage + Cloud CDN | Static Assets |
| Bot challenge | AWS WAF CAPTCHA | reCAPTCHA Enterprise | Turnstile |
| Scheduled cache refresh | EventBridge Scheduler → Lambda | Cloud Scheduler → Cloud Run | Cron Triggers |

One row is more interesting than the rest: **the per-session state.** The clean version of this design wants a single addressable object per chat session — one place to hold that user’s history, rate-limit counter, and blocked-domain list without hitting a shared database on every keystroke. One of the three hands you that primitive directly (Durable Objects); on the other two you build it yourself, from stateless compute plus a managed store keyed by the session id. The first is less code; the second is more portable. Pick your trade.

For the rest, the closest *spiritual* port is “a small container that holds the streaming connection and talks to a database” — a long-lived service like Cloud Run, Fargate, or App Runner — rather than the fully serverless, one-invocation-per-message path. Both work; the container is just fewer moving parts when the entire job is to keep a socket open and stream tokens down it.

Two things stay constant wherever it runs. The knowledge bases are nothing more than **SQLite plus full-text search**, and every one of these platforms has a stand-in (Postgres’ built-in full-text search is the usual one). And Claude is reachable from all of them — the Anthropic API directly, or through Amazon Bedrock or Google Vertex AI — so “which cloud” and “which model provider” stay independent choices.

## How I built this with Claude Code

I should be clear about who wrote what. I designed the architecture, made the judgment calls, and curated every source — but I didn’t write the code. [Claude Code](https://www.anthropic.com/claude-code) did. My job was to know *what* to build and to catch it when it was wrong; the typing was its problem. That division of labor is the only reason someone with an evening or two a week can ship and run four of these.

Here’s roughly how I drove it — the order each site came together in:

**Scaffold the chat shell.** The first prompt describes the architecture — a streaming chat UI, a serverless backend, a per-session object — and nothing domain-specific. Claude Code wires the boilerplate I never want to write again: the socket routing, the token streaming, the session plumbing. By the end of the first sitting you have a working (if clueless) chatbot. This is where you pick your stack; the agent fills it in.**Add the two-model gate.** Then make it cheap: a small model classifies every message before the strong one answers. This is also where Claude Code’s first draft taught me something — its initial system prompt let the model say “let me look that up…” and then stop, stranding the user. That’s the bug behind the “never narrate a lookup” rule above. I described the failure; it found the fix. Expect that loop constantly: it writes, you watch it break on a real input, you describe what’s wrong, it corrects.**Build the knowledge base.** The domain-specific heart, and where*your*work matters most. You decide the shape — a prose corpus, a structured database, a book — and you curate the sources; Claude Code writes the ingest script and the search tool. The fiddly production lessons surfaced right here (the blocked-domain retry, creating indexes*before*the big bulk-load), and they came out of the two of us staring at failures together, not from me knowing them up front.**Guardrails and cost controls.** Rate limits, the bot challenge, the answer cache, history pruning. Mostly mechanical: state the policy, let it implement. The predictive warm-up trick was just me noticing a cold-start in testing and asking, “this is slow on the first query — can we hide it behind the user typing?”**Verify and harden.** Where a human still earns their keep — though I had more help than I expected. When a response came back wrong, or just hung, I didn’t go log-diving myself; I told Claude Code, and it would tail the logs, fire test queries at the backend, and iterate until it cornered the cause. My job was the judgment: read the diffs, throw ugly and abusive inputs at the gate, and watch the bill. It writes*and*debugs faster than you can read — knowing what*correct*and*cheap*look like is the part that isn’t automated yet.

The part that surprised me most came at the end. The unglamorous polish — accessibility, SEO, the newer answer-engine optimization (being citable by AI), Open Graph cards, favicons, raw performance — used to be the most tedious, never-finished work in any project. There was always one more thing to remember.

So I had Claude Code audit every site top to bottom, recommend fixes, and implement them. Where Google’s PageSpeed Insights came back in the high 80s and low 90s, I just pasted in a screenshot; it diagnosed the cause, fixed it, and re-tested. They load fast and score a clean 100 across the board now. That same fluency made the cosmetic work cheap, too: I could try on a dozen color schemes and layouts across the sites in an afternoon, where each change used to cost a weekend.

That’s the whole shape of it: I never touched the socket plumbing or the SQL, and never wanted to. I spent my time on the parts a model can’t do for you yet — choosing the architecture, curating the sources, and judging whether the answer was actually right.

**Want to build your own?** I packaged all of this into a starter kit you can hand straight to Claude Code — the architecture, a paste-in starter prompt, a `CLAUDE.md`

template, and the five phases above, all host-neutral:

**→ Download the build kit** — or point Claude Code at the URL and tell it your domain.

## What’s next

A couple of things are designed but not yet shipped. There’s a worked-out plan for **email gating** across all of them — a few free questions, then a one-time email verification before an anonymous visitor can become a real cost — and, for RennQuery specifically, a design for pulling **Wikimedia Commons photos** of car variants into answers. Both are in the “researched, not built” pile, which is where evening projects live.

But the real takeaway is that the pattern is *repeatable*. The first one took real effort. The fourth was mostly: pick a domain, write a new system prompt, curate the sources, decide what the knowledge base should be — and let Claude Code rebuild the rest. A general-purpose model plus a small amount of carefully curated local knowledge makes a surprisingly convincing specialist — and doing the curation yourself is what keeps it from making things up.

And none of this is really about jazz or watches. The architecture is domain-agnostic; the knowledge base is the only part that changes. Point the same machine at a product’s docs and support tickets and it’s a support bot that actually knows your software. At a law firm’s contracts and case history, it’s a research assistant. At a retailer’s inventory database, a concierge that knows what’s in stock. The same recipe works over a catalog of song lyrics, the full canon of a book or TV series, a library of meeting or podcast transcripts, a parts catalog and its spec sheets. The fun version is a jazz nerd in a box — but the same machine sits under a lot of problems that aren’t fun at all, and are worth real money.

If you want to poke at them: [JazzQuery](https://jazzquery.com), [RennQuery](https://rennquery.com), [WristQuery](https://wristquery.com), and [Celiac vs Me](https://celiacvsme.com). They’re all on my [things I’ve built](/apps) page, along with everything else.

The pattern’s so repeatable now that the only thing stopping a fifth is remembering who pays for the first four.