cd/entity/PyTorch· home entities PyTorch
grep -l @pytorch /news/*.json | wc -l → 154

PyTorch

mentions 154 type Organization page 7/8 feed RSS

// recent coverage 154 mentions

02:11
2026-05-28
runwayml.com
machine-learning

DTensor, Correctness and the Costs of Abstraction

DTensor, PyTorch's distributed tensor abstraction, attaches placement metadata to every tensor to automatically propagate layouts and insert correct collective operations during distributed training. …

19:09
2026-05-27
pytorch.org
machine-learning

Why Is PyTorch Compile So Fast: Kernel Fusion

PyTorch's Inductor compiler uses kernel fusion to accelerate model execution by up to 10x, grouping dependent operations into single Triton kernels to reduce memory traffic and kernel launch overhead.…

11:02
2026-05-27
dev.to
ai-infrastructure

TensorCircuit-NG: Quantum Software On AI, For AI, With AI

TensorCircuit-NG, a quantum software stack built on AI infrastructure, treats quantum circuits as specialized tensor operations to leverage existing AI tooling for automatic differentiation, compilati…

06:53
2026-05-27
benfrederickson.com
machine-learning

Python as a Declarative Programming Language (2017)

Python's performance in the benchmarks game is roughly 40 times slower than C or C++, yet it remains the dominant language for data analysis and machine learning because core libraries like NumPy, Ten…

05:37
2026-05-27
dev.to
machine-learning

The bf16 grad accumulator that killed our SDXL LoRA training

Photoroom's SDXL LoRA fine-tuning for a product photography model silently corrupted its adapter weights over six days due to a bf16 gradient accumulation issue. The custom training loop, forked from …

22:08
2026-05-26
developer.nvidia.com
ai-infrastructure

Extract More Kernel Performance with NVIDIA CompileIQ Auto-Tuning

NVIDIA released CompileIQ, an AI-powered compiler auto-tuning framework that uses evolutionary and genetic algorithms to optimize GPU compilers for individual workloads. The tool, included in NVIDIA C…

13:56
2026-05-26
github.com
ai-infrastructure

Wave – A universal GPU instruction set architecture

A new open-source project called WAVE has introduced a vendor-neutral GPU instruction set architecture that allows developers to write GPU code once and run identical binaries on NVIDIA, AMD, Apple, a…

12:51
2026-05-26
klongpy.org
machine-learning

KlongPy: PyTorch Back End and Autograd

KlongPy now supports a PyTorch backend that enables GPU acceleration and automatic differentiation for gradient-based computations. The torch backend outperforms NumPy by up to 8x on large arrays and …

11:50
2026-05-23
horace.io
machine-learning

Making Deep Learning Go Brrrr from First Principles (2022)

The article explains that optimizing deep learning performance should be approached by identifying whether a system is bottlenecked by compute, memory bandwidth, or overhead, rather than relying on ad…

11:50
2026-05-23
horace.io
machine-learning

Making Deep Learning Go Brrrr from First Principles

The article explains that optimizing deep learning performance should be approached by reasoning from first principles—identifying whether a system is bottlenecked by compute, memory bandwidth, or ove…

07:00
2026-05-22
leimao.github.io
machine-learning

PyTorch Triton Kernel Transparent Tracing and Compilation

PyTorch has introduced transparent tracing and compilation for Triton kernels, allowing custom operations to be visible to the compiler for optimization. The framework now supports compiling Triton ke…

← prev page 7 / 8 next →
// co-occurs with top 8 entities