haosdent

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

19:13

2026-06-20

gist.github.com

large-language-models

Running GLM-5.2 (753B DeepSeek-Sparse-Attention MoE) on 8x A100 80GB with vLLM — TRITON_MLA_SPARSE backend (PR #38476), no-recompile install, benchmarks

A developer confirmed that GLM-5.2, a 753B-parameter DeepSeek-Sparse-Attention MoE model, runs on 8x A100 80GB GPUs using vLLM PR #38476, which adds a Triton sparse-MLA backend for Ampere architecture…

// co-occurs with top 7 entities

GLM-5.2 1 DeepSeek 1 vLLM 1 A100 1 Triton 1 AWQ 1 cyankiwi 1