04:54
2026-05-22
arxiv.org
machine-learning
CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs
CODA, a GPU kernel abstraction that reparameterizes memory-bound Transformer operations like normalization and activations to execute as GEMM-plus-epilogue programs, keeping data on-chip to reduce gloโฆ