cd/sources/kyrieblunders-auto-discoveredยท homeโ€บ sourcesโ€บ Kyrieblunders (auto-discovered)
cat /sources/kyrieblunders-auto-discovered.feed | wc -l โ†’ 1

Kyrieblunders (auto-discovered)

articles 1 domain kyrieblunders.bearblog.dev โ†’ feed RSS
17:26
2026-06-02
kyrieblunders.bearblog.dev
machine-learning

I made a kernel 2.2x faster. It made my training loop 3x slower

A developer wrote a fused decode-attention kernel that ran 2.2ร— faster than the baseline in microbenchmarks, but when integrated into a HuggingFace `generate` call for an RL training loop, the decode โ€ฆ