19:14
2026-06-19
github.com
large-language-models
Pipeline-parallel LLM inference across GPUs on separate machines
A 744-billion-parameter GLM-5.2 model was served at ~30 tokens per second across six prosumer Blackwell GPUs in six US states over a wide-area network using pipeline parallelism and speculative decodiβ¦