Shepherd Model Gateway

mentions 1 type Person feed RSS

// recent coverage 1 mentions

18:56

2026-04-30

pytorch.org

large-language-models

SMG: The Case for Disaggregating CPU from GPU in LLM Serving

Shepherd Model Gateway (SMG) has disaggregated all CPU-bound workloads from GPU inference in large language model serving, moving tokenization, detokenization, and parsing into a dedicated Rust gatewa…

// co-occurs with top 7 entities

SGLang 1 vLLM 1 Python 1 Rust 1 GIL 1 GPU 1 CPU 1