cd /news/developer-tools/fts5-on-cloudflare-d1-cut-my-korean-… · home topics developer-tools article
[ARTICLE · art-37326] src=dev.to ↗ pub= topic=developer-tools verified=true sentiment=↑ positive

FTS5 on Cloudflare D1 cut my Korean keyword search from 220ms to 22ms — here's what almost broke it

A developer cut Korean keyword search latency from 220ms to 22ms on Cloudflare D1 by migrating to FTS5, but encountered pitfalls with negative bm25() scores and Korean tokenization. The developer mitigated recall loss by splitting user input with OR joins and set up triggers to keep the content table in sync. After six weeks at 390K rows, P50 latency was 22ms and P95 under 45ms.

read2 min views5 publishedJun 24, 2026

The bm25()

relevance score in FTS5 returns a negative float. Lower means more relevant. I ordered results DESC

for three days and wondered why garbage surfaced first.

I run 12 production Workers handling Korean D2C ad-ops. One indexes ~400K ad keywords — brand names, product slugs, mixed Korean/Latin search terms — and drives a dashboard that rerenders on every keystroke. The original LIKE '%키워드%'

scans were clocking 180–220ms in D1. That's a hard no for typeahead. FTS5 is shipped enabled in D1, no opt-in required, so the migration looked straightforward. It wasn't.

The first real trap is Korean tokenization. FTS5's default unicode61

tokenizer splits on whitespace and punctuation. Korean doesn't use spaces between morphemes consistently — a single compound word can contain what a user types as two separate search tokens. That mismatch kills recall on single-morpheme queries. By week 3 at 180K rows I was measuring 88% recall on a hand-verified 500-row golden set, down from 91% at 40K rows. The drop isn't catastrophic but it's directional, and it gets worse as the index grows with noisier long-tail terms. The mitigation I landed on was query-side: splitting user input and joining tokens with OR

before passing to MATCH

, which recovers most of the recall loss without touching the tokenizer.

const ftsQuery = query.trim().split(/\s+/).join(' OR ');

The second trap is content tables. content='kw_master'

means FTS5 stores only the index, not the raw text — good for space, bad if you forget that inserts, updates, and deletes on kw_master

don't automatically propagate. You need triggers, or the index silently drifts from the source table. I set up three triggers (insert, delete, update) and a one-time backfill. P50 latency at 390K rows after six weeks: 22ms. P95: under 45ms. Both within the sub-50ms target.

There's more to the full story — the exact trigger DDL that handles the content table sync without double-writing, how I sampled latency inside the Worker without killing performance, and the one wrangler command sequence I run at 3am when the index drifts.

I wrote up the full breakdown — including the week-6 benchmark table and the production incident that validated the trigger setup — over on dailymanuallab.com.

── more in #developer-tools 4 stories · sorted by recency
── more on @cloudflare d1 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/fts5-on-cloudflare-d…] indexed:0 read:2min 2026-06-24 ·