18:57
2026-06-16
injuly.in
large-language-models
Inference cost at scale with napkin math
A technical analysis calculates the dollar cost per user for serving large language models at scale using napkin math, breaking down GPU resources, matrix multiplication costs, and attention mechanismβ¦