Most "AI games" phone home. Every turn is an API round-trip, every player burns your tokens, and the whole thing dies the day the bill scares you. I wanted the opposite: a text roguelike where the dungeon master is an LLM that runs entirely in the player's browser — no server, no API key, no per-token cost, and it keeps working offline after the first load.
Here's the architecture and the one bug that taught me the most.
WebLLM compiles quantized models to WebGPU, so inference runs on the player's GPU. There is no backend at all.
const cdn = "https://esm.run/@mlc-ai/web-llm";
const webllm = await import(/* webpackIgnore: true */ cdn);
const engine = await webllm.CreateMLCEngine(MODEL_ID, {
initProgressCallback: p => set(p.text),
});
First load pulls the weights once (the browser caches them). After that every turn is local and free.
A dungeon master should be creative, but it must not be allowed to break the rules. The split that made it stable:
My most instructive bug: early on I let the prose drive death detection (regex for "you die"), and the model cheerfully killed players on turn one with pure flavor text — "this could be the end of you" → game over. Moving death to an integer the engine owns (if (hp <= 0)
) fixed it instantly.
Rule of thumb: the LLM writes the story; your code keeps the score.
The tradeoff is model size: you run something small enough to load in a tab, so prompt design carries real weight. For a narrative game master that's a fair trade.
Genre is just a config object — palette, HUD labels, seed scenarios, system prompt. Same engine, swap the config, ship a different game. Adding a genre is data, not a code change, which means a generator can author new ones.
If you want to poke at a live one, the cyberpunk build (NeonHeist) and a few others are up under Games at bestpaid.app — all running on-device.
Happy to go deeper on the JSON-contract prompt or the WebGPU UX in the comments.