cd/entity/SDPAยท homeโ€บ entitiesโ€บ SDPA
grep -l @sdpa /news/*.json | wc -l โ†’ 1

@SDPA

mentions 1 type Organization feed RSS
17:26
2026-06-02
kyrieblunders.bearblog.dev
machine-learning

I made a kernel 2.2x faster. It made my training loop 3x slower

A developer wrote a fused decode-attention kernel that ran 2.2ร— faster than the baseline in microbenchmarks, but when integrated into a HuggingFace `generate` call for an RL training loop, the decode โ€ฆ

// co-occurs with top 6 entities