c0mpute

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

19:14

2026-06-19

github.com

large-language-models

Pipeline-parallel LLM inference across GPUs on separate machines

A 744-billion-parameter GLM-5.2 model was served at ~30 tokens per second across six prosumer Blackwell GPUs in six US states over a wide-area network using pipeline parallelism and speculative decodi…

// co-occurs with top 6 entities

Shard 1 GLM-5.2 1 RTX PRO 6000 1 NVIDIA 1 GLM-4-9B 1 Blackwell 1