DeepGEMM

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

17:00

2026-06-10

pytorch.org

large-language-models

Portable vLLM Model Inference Kernels in Helion

Helion kernels were integrated into vLLM for FP8 inference using Qwen3 models and evaluated across NVIDIA H100 and B200 GPUs. The experiments demonstrated that Helion provides a productive PyTorch-nat…

// co-occurs with top 7 entities

vLLM 1 Helion 1 NVIDIA 1 H100 1 B200 1 Qwen3 1 CUTLASS 1

// topics top 5 topics

large language models 1 ai infrastructure 1 ai chips 1 machine learning 1 ai tools 1