cd/sources/leimao-auto-discovered· home sources Leimao (auto-discovered)
cat /sources/leimao-auto-discovered.feed | wc -l → 5

Leimao (auto-discovered)

articles 5 domain leimao.github.io → feed RSS
07:00
2026-06-05
leimao.github.io
machine-learning

Synchronizations With TorchRec KeyedJaggedTensor

TorchRec's KeyedJaggedTensor, designed to efficiently combine sparse features in recommendation systems without padding, introduces GPU-CPU synchronization that degrades system performance. The data t…

15:39
2026-06-01
leimao.github.io
machine-learning

PyTorch Custom Operation

PyTorch users can now implement custom operations in C++ and CUDA for use in both Python and C++ inference programs, with automatic device dispatch between CPU and CUDA implementations. The approach s…

07:00
2026-05-28
leimao.github.io
machine-learning

PyTorch AOTInductor Hybrid Lowering

PyTorch AOTInductor now compiles exported programs with hybrid CPU-GPU execution plans into a single executable package, eliminating the need to manually split models into separate device sub-models. …

07:00
2026-05-22
leimao.github.io
machine-learning

PyTorch Triton Kernel Transparent Tracing and Compilation

PyTorch has introduced transparent tracing and compilation for Triton kernels, allowing custom operations to be visible to the compiler for optimization. The framework now supports compiling Triton ke…

07:00
2026-05-17
leimao.github.io
machine-learning

PyTorch Fake Export

PyTorch introduced a "fake export" method that allows developers to verify the exportability of large deep learning models using `torch.export` APIs without requiring actual GPU memory. The approach u…