Fable 5 pushed Gemma 4 to 255 tok/s on WebGPU

Fable 5, an AI agent, achieved 255 tokens per second on Gemma 4 inference using WebGPU before its access was suspended. The developer released the demo and kernels, claiming agentic kernel optimization is the future of on-device inference.

Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real. Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser. Agentic kernel optimization is the future of on-device inference I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference. It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible. Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s. The next day, access to Fable 5 was suspended globally. Jun 17, 2026 · 4:54 PM UTC 69 160 1,733 264,359 In case you hadn't noticed, we're working on something big. Stay tuned. 🔗 Link to the demo: huggingface.co/spaces/webml-… https://huggingface.co/spaces/webml-community/gemma-4-webgpu-kernels 5 9 120 7,879