06:22
2026-05-26
arxiv.org
machine-learning
ThriftAttention: Selective Mixed Precision for Long-Context FP4 Attention
Researchers have developed ThriftAttention, a mixed-precision attention algorithm that selectively computes only 5% of query-key blocks in FP16 precision while processing the remaining 95% in FP4, recβ¦