SDPA

mentions 2 type Organization feed RSS

// recent coverage 2 mentions

04:03

2026-07-21

dev.to

machine-learning

Testing PyTorch 2.13 MPS FlexAttention on M1 Max: Up to 7.83x Faster for Sparse Attention

A developer benchmarked PyTorch 2.13's FlexAttention on an M1 Max Mac, finding up to 7.83x speedup over standard SDPA for sparse attention with 32,768 tokens and a 256-token local window. FlexAttentio…

17:26

2026-06-02

kyrieblunders.bearblog.dev

machine-learning

I made a kernel 2.2x faster. It made my training loop 3x slower

A developer wrote a fused decode-attention kernel that ran 2.2× faster than the baseline in microbenchmarks, but when integrated into a HuggingFace `generate` call for an RL training loop, the decode …

// co-occurs with top 8 entities

HuggingFace 1 Qwen2.5-0.5B-Instruct 1 Dr. GRPO 1 GSM8K 1 A10G 1 CuteDSL 1 PyTorch 1 Apple Silicon 1

// topics top 6 topics

machine learning 2 large language models 2 ai infrastructure 2 artificial intelligence 1 ai research 1 developer tools 1