18:56
2026-04-30
pytorch.org
large-language-models
SMG: The Case for Disaggregating CPU from GPU in LLM Serving
Shepherd Model Gateway (SMG) has disaggregated all CPU-bound workloads from GPU inference in large language model serving, moving tokenization, detokenization, and parsing into a dedicated Rust gatewaβ¦