15:39
2026-05-27
pytorch.org
large-language-models
Up to 580tps! New Speed Record of Qwen3.5-397B-A17B on GPU for Agentic Workloads with TokenSpeed
TokenSpeed, an open-source inference engine, achieved a record-breaking 580 tokens per second running the Qwen3.5-397B-A17B model on GPUs. The performance gain for agentic workloads comes from eliminaβ¦