# FTS5 on Cloudflare D1 cut my Korean keyword search from 220ms to 22ms — here's what almost broke it

> Source: <https://dev.to/riversea/fts5-on-cloudflare-d1-cut-my-korean-keyword-search-from-220ms-to-22ms-heres-what-almost-broke-it-3l9p>
> Published: 2026-06-24 05:07:42+00:00

The `bm25()`

relevance score in FTS5 returns a *negative* float. Lower means more relevant. I ordered results `DESC`

for three days and wondered why garbage surfaced first.

I run 12 production Workers handling Korean D2C ad-ops. One indexes ~400K ad keywords — brand names, product slugs, mixed Korean/Latin search terms — and drives a dashboard that rerenders on every keystroke. The original `LIKE '%키워드%'`

scans were clocking 180–220ms in D1. That's a hard no for typeahead. FTS5 is shipped enabled in D1, no opt-in required, so the migration looked straightforward. It wasn't.

The first real trap is Korean tokenization. FTS5's default `unicode61`

tokenizer splits on whitespace and punctuation. Korean doesn't use spaces between morphemes consistently — a single compound word can contain what a user types as two separate search tokens. That mismatch kills recall on single-morpheme queries. By week 3 at 180K rows I was measuring 88% recall on a hand-verified 500-row golden set, down from 91% at 40K rows. The drop isn't catastrophic but it's directional, and it gets worse as the index grows with noisier long-tail terms. The mitigation I landed on was query-side: splitting user input and joining tokens with `OR`

before passing to `MATCH`

, which recovers most of the recall loss without touching the tokenizer.

``` js
const ftsQuery = query.trim().split(/\s+/).join(' OR ');
```

The second trap is content tables. `content='kw_master'`

means FTS5 stores only the index, not the raw text — good for space, bad if you forget that inserts, updates, and deletes on `kw_master`

don't automatically propagate. You need triggers, or the index silently drifts from the source table. I set up three triggers (insert, delete, update) and a one-time backfill. P50 latency at 390K rows after six weeks: 22ms. P95: under 45ms. Both within the sub-50ms target.

There's more to the full story — the exact trigger DDL that handles the content table sync without double-writing, how I sampled latency inside the Worker without killing performance, and the one wrangler command sequence I run at 3am when the index drifts.

I wrote up the full breakdown — including the week-6 benchmark table and the production incident that validated the trigger setup — over on dailymanuallab.com.
