Fable 5 pushed Gemma 4 to 255 tok/s on WebGPU

wpnews.pro

cd /news/artificial-intelligence/fable-5-pushed-gemma-4-to-255-tok-s-… · home › topics › artificial-intelligence › article

[ARTICLE · art-32692] src=xcancel.com ↗ pub=2026-06-18T14:14Z topic=artificial-intelligence verified=true sentiment=↑ positive

Fable 5 pushed Gemma 4 to 255 tok/s on WebGPU

Fable 5, an AI agent, achieved 255 tokens per second on Gemma 4 inference using WebGPU before its access was suspended. The developer released the demo and kernels, claiming agentic kernel optimization is the future of on-device inference.

read1 min views31 publishedJun 18, 2026

Before Fable 5 was shut down, it pushed Gemma 4 to 255 tok/s on WebGPU. Some didn't believe it was real. Today we're releasing the demo and kernels it wrote for you to see yourself. Run it locally in your browser. Agentic kernel optimization is the future of on-device inference

I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference. It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible. Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s. The next day, access to Fable 5 was suspended globally.

Jun 17, 2026 · 4:54 PM UTC

160

1,733

264,359

In case you hadn't noticed, we're working on something big. Stay tuned. 🔗 Link to the demo:

huggingface.co/spaces/webml-… 5 9

120

7,879

source & further reading

xcancel.com — original article AI companies are shredding rare books Now that Huang believes in open source, looking forward to CUDA open sourcing Jensen Huang on why open models matter

~/api · this article 200

$curl api.wpnews.pro/v1/news/fable-5-pushed-gemma-4-t…

Read original on xcancel.com → xcancel.com/xenovacom/status/2067289897111638484

mentioned entities

Fable 5

Gemma 4

WebGPU

Anthropic

metadata

slugfable-5-pushed-gemma-4-to-255-tok-s-on-webgpu

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalxcancel.com

navigation

← prevAutomate Database Backups Across…

next →Blink Video doorbell bundle back…

── more in #artificial-intelligence 4 stories · sorted by recency

insideai.news · 3 Aug · #artificial-intelligence

Alibaba Launches Qwen3.8-Max, a 2.4 Trillion Parameter AI Model

insideai.news · 3 Aug · #artificial-intelligence

AI Model Claude Fable 5 Finds Counterexample to 1939 Math Conjecture

startupfortune.com · 3 Aug · #artificial-intelligence

Alibaba Says Its New Qwen3.8-Max Model Trails Only Anthropic's Claude

cryptobriefing.com · 3 Aug · #artificial-intelligence

Alibaba releases flagship AI model with 2.4 trillion parameters, rivaling Anthropic’s Fable

── more on @fable 5 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required