16:16
2026-05-28
blog.kog.ai
large-language-models
Delayed Tensor Parallelism for Faster Transformer Inference
Kog Team researchers introduced Delayed Tensor Parallelism (DTP), a Transformer architecture that hides communication overhead behind computation and weight streaming to accelerate batch-size-one infeβ¦