17:00
2026-06-10
pytorch.org
large-language-models
Portable vLLM Model Inference Kernels in Helion
Helion kernels were integrated into vLLM for FP8 inference using Qwen3 models and evaluated across NVIDIA H100 and B200 GPUs. The experiments demonstrated that Helion provides a productive PyTorch-natβ¦