FTS5 on Cloudflare D1 cut my Korean keyword search from 220ms to 22ms — here's what almost broke it

wpnews.pro

cd /news/developer-tools/fts5-on-cloudflare-d1-cut-my-korean-… · home › topics › developer-tools › article

[ARTICLE · art-37326] src=dev.to ↗ pub=2026-06-24T05:07Z topic=developer-tools verified=true sentiment=↑ positive

FTS5 on Cloudflare D1 cut my Korean keyword search from 220ms to 22ms — here's what almost broke it

A developer cut Korean keyword search latency from 220ms to 22ms on Cloudflare D1 by migrating to FTS5, but encountered pitfalls with negative bm25() scores and Korean tokenization. The developer mitigated recall loss by splitting user input with OR joins and set up triggers to keep the content table in sync. After six weeks at 390K rows, P50 latency was 22ms and P95 under 45ms.

read2 min views5 publishedJun 24, 2026

The bm25()

relevance score in FTS5 returns a negative float. Lower means more relevant. I ordered results DESC

for three days and wondered why garbage surfaced first.

I run 12 production Workers handling Korean D2C ad-ops. One indexes ~400K ad keywords — brand names, product slugs, mixed Korean/Latin search terms — and drives a dashboard that rerenders on every keystroke. The original LIKE '%키워드%'

scans were clocking 180–220ms in D1. That's a hard no for typeahead. FTS5 is shipped enabled in D1, no opt-in required, so the migration looked straightforward. It wasn't.

The first real trap is Korean tokenization. FTS5's default unicode61

tokenizer splits on whitespace and punctuation. Korean doesn't use spaces between morphemes consistently — a single compound word can contain what a user types as two separate search tokens. That mismatch kills recall on single-morpheme queries. By week 3 at 180K rows I was measuring 88% recall on a hand-verified 500-row golden set, down from 91% at 40K rows. The drop isn't catastrophic but it's directional, and it gets worse as the index grows with noisier long-tail terms. The mitigation I landed on was query-side: splitting user input and joining tokens with OR

before passing to MATCH

, which recovers most of the recall loss without touching the tokenizer.

const ftsQuery = query.trim().split(/\s+/).join(' OR ');

The second trap is content tables. content='kw_master'

means FTS5 stores only the index, not the raw text — good for space, bad if you forget that inserts, updates, and deletes on kw_master

don't automatically propagate. You need triggers, or the index silently drifts from the source table. I set up three triggers (insert, delete, update) and a one-time backfill. P50 latency at 390K rows after six weeks: 22ms. P95: under 45ms. Both within the sub-50ms target.

There's more to the full story — the exact trigger DDL that handles the content table sync without double-writing, how I sampled latency inside the Worker without killing performance, and the one wrangler command sequence I run at 3am when the index drifts.

I wrote up the full breakdown — including the week-6 benchmark table and the production incident that validated the trigger setup — over on dailymanuallab.com.

source & further reading

dev.to — original article Why AI Agents Fail Silently — And How to Fix It A technical deep-dive into the observability gap in multi-step LLM systems AI Coding Agents Need Tests More Than Prompts Dokploy Setup Guide

~/api · this article 200

$curl api.wpnews.pro/v1/news/fts5-on-cloudflare-d1-cu…

Read original on dev.to → dev.to/riversea/fts5-on-cloudflare-d1-cut-my-kor…

mentioned entities

Cloudflare D1

FTS5

bm25

unicode61

kw_master

metadata

slugfts5-on-cloudflare-d1-cut-my-korean-keyword-search-from-220ms-to-22ms-here-s-it

topic#developer-tools

secondary2 topics

sentimentpositive

canonicaldev.to

navigation

← prevBanks are rushing to recruit AI …

next →Statement from Five Eyes agencie…

── more in #developer-tools 4 stories · sorted by recency

dev.to · 25 Jun · #developer-tools

Voice mode in Claude Code (VS Code, macOS)

dev.to · 25 Jun · #developer-tools

Why AI Agents Fail Silently — And How to Fix It A technical deep-dive into the observability gap in multi-step LLM systems

dev.to · 25 Jun · #developer-tools

AI Coding Agents Need Tests More Than Prompts

dev.to · 25 Jun · #developer-tools

MCP Knowledge: Simple Beats Complex When AI Thinks

── more on @cloudflare d1 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 24 Jun · #ai-policy

An AI startup is suing the US government for taking away Anthropic's new model

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required