21:49
2026-07-03
wafer.ai
large-language-models
GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell
Wafer served GLM5.2 on AMD MI355X GPUs at 2626 tokens per second per node with over 2x lower cost than NVIDIA Blackwell, achieving 213 tok/s single stream. The company used MXFP4 quantization via AMD โฆ