cd /news/machine-learning/compiles-any-huggingface-model-into-… · home topics machine-learning article
[ARTICLE · art-31850] src=twitter.com ↗ pub= topic=machine-learning verified=true sentiment=↑ positive

Compiles any HuggingFace model into a single persistent megakernel

A developer open-sourced AutoMegakernel, a tool that compiles any HuggingFace model into a single persistent megakernel, reducing overhead by launching one kernel per forward pass. It includes a static validator to prevent deadlocks and races, and achieves up to 1.33x speedup on L4 GPUs for batch-1 int8 inference compared to CUDA-graphed cuBLAS bf16, though it loses on A100/H100.

read1 min views2 publishedJun 17, 2026

i open-sourced automegakernel -- compiles any huggingface model into a single persistent megakernel batch-1 decode is bandwidth-bound. normal execution launches one kernel per op and round-trips activations through HBM dozens of times a layer. that overhead is the whole problem he entire forward pass into one launch. one launch = one forward = one token the hard part is a single kernel across every SM synced only by counters is a deadlock/race minefield. so the core piece is a static validator that proves any schedule deadlock-free + race-free before launch. an agent can edit the schedule freely and can't ship a hanging kernel. 7160 adversarial schedules, 6091 unsafe, zero false accepts one source retargets sm_80 / sm_90 / sm_120. reproduces huggingface greedy decode token-for-token on real smollm2-135m search-found int8 megakernel beats cuda-graphed cuBLAS bf16 at batch-1: L4 up to 1.33x L40S 1.25-1.27x. it loses on A100/H100 and we say so llama-family only for now:p sc:

── more in #machine-learning 4 stories · sorted by recency
── more on @huggingface 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/compiles-any-hugging…] indexed:0 read:1min 2026-06-17 ·