cd/sources/pytorch-blog· home sources PyTorch Blog
cat /sources/pytorch-blog.feed | wc -l → 17

PyTorch Blog

articles 17 domain pytorch.org → feed RSS
13:47
2026-06-12
pytorch.org
artificial-intelligence

PyTorch Meetup Singapore: A milestone in APAC

Eighty engineers, researchers, and community builders gathered for the inaugural PyTorch Meetup Singapore, hosted at the Red Hat Asia Pacific office. The event featured technical talks on inference, d…

17:00
2026-06-10
pytorch.org
large-language-models

Portable vLLM Model Inference Kernels in Helion

Helion kernels were integrated into vLLM for FP8 inference using Qwen3 models and evaluated across NVIDIA H100 and B200 GPUs. The experiments demonstrated that Helion provides a productive PyTorch-nat…

15:05
2026-06-03
pytorch.org
machine-learning

Using Muon Optimizer with DeepSpeed

DeepSpeed has integrated the Muon Optimizer, a memory-efficient optimizer that uses a single momentum buffer and Newton-Schulz orthogonalization to improve training convergence, particularly for 2D we…

18:43
2026-06-01
pytorch.org
machine-learning

When does fragmentation occur in the CUDA caching allocator?

The CUDA caching allocator in PyTorch fragments memory when allocated blocks prevent adjacent free blocks from merging, causing allocation failures despite sufficient total free memory. This fragmenta…

19:09
2026-05-27
pytorch.org
machine-learning

Why Is PyTorch Compile So Fast: Kernel Fusion

PyTorch's Inductor compiler uses kernel fusion to accelerate model execution by up to 10x, grouping dependent operations into single Triton kernels to reduce memory traffic and kernel launch overhead.…

01:00
2026-05-27
pytorch.org
artificial-intelligence

Alibaba Cloud Joins the PyTorch Foundation as a Platinum Member

Alibaba Cloud has joined the PyTorch Foundation as a Platinum member, gaining a seat on the foundation's Governing Board and a position on its Technical Advisory Committee. The Chinese cloud computing…

15:45
2026-05-20
pytorch.org
machine-learning

PyTorch Docathon 2026 Results in 150+ Merged Pull Requests

The PyTorch Docathon 2026, held from May 5 to May 19, resulted in over 150 merged pull requests after more than 260 registrants and 30 active participants contributed fixes, API documentation, and Exe…

18:36
2026-05-13
pytorch.org
machine-learning

PyTorch 2.12 Release Blog

PyTorch 2.12 introduces a new device-agnostic `torch.accelerator.Graph` API that unifies graph capture and replay across CUDA, XPU, and out-of-tree backends. The release delivers up to 100x faster bat…

18:56
2026-04-30
pytorch.org
large-language-models

SMG: The Case for Disaggregating CPU from GPU in LLM Serving

Shepherd Model Gateway (SMG) has disaggregated all CPU-bound workloads from GPU inference in large language model serving, moving tokenization, detokenization, and parsing into a dedicated Rust gatewa…

15:25
2026-04-29
pytorch.org
large-language-models

Introducing AutoSP

Researchers at Microsoft have introduced AutoSP, a compiler-based solution that automatically converts standard training code into multi-GPU sequence parallel code for long-context language model trai…