16:29
2026-06-18
devashish.me
large-language-models
Two Qwen3 models on one DGX Spark: the residency math
Alibaba's Qwen3-80B and Qwen3-4B models were successfully co-located on a single NVIDIA DGX Spark using vLLM containers behind a LiteLLM proxy, but the 80B model's inability to emit tool calls in autoβ¦