04:23
2026-06-19
github.com
ai-infrastructure
Profile(v2.1.4) physics-aware optimizer for vLLM (31β470 tok/s on A100)
Profile v2.1.4, a physics-aware optimizer for vLLM inference servers, achieved a 15x throughput increase from 31 to 470 tok/s and a 93% cost reduction on an NVIDIA A100 GPU. The tool uses roofline matβ¦