{"slug": "dgx-spark-hitting-83-c-under-sustained-ollama-load-solved-by-clock-locking-via", "title": "DGX Spark hitting 83 C under sustained Ollama load — solved by clock-locking via nvidia-smi -lgc", "summary": "A developer created a daemon to reduce GPU temperatures on the NVIDIA DGX Spark by clock-locking via nvidia-smi -lgc. The daemon samples temperature every 30 seconds and adjusts clock ceilings, dropping sustained temperatures from 83°C to 72°C under heavy Ollama workloads. The solution addresses the lack of user-exposed power-limit or fan-curve controls on the GB10 GPU.", "body_md": "**TL;DR:** GB10 in the DGX Spark has no user-exposed power-limit or fan-curve control (`nvidia-smi`\n\nreturns `[N/A]`\n\nfor both — firmware-managed). But `nvidia-smi --lock-gpu-clocks`\n\nDOES work. I wrote a tiny daemon that samples temp every 30s and steps the clock ceiling down 150 MHz whenever it enters the warning band, then relaxes it back up after 3 consecutive cool samples. Ollama gpt-oss:120b + qwen2.5:72b workload — dropped from 83 °C → 72 °C, sustained, same util.\n\nMy DGX Spark serving Ollama (~40 GB VRAM across three model instances, sustained 94% util) sits at 82–84 °C indefinitely. No thermal-throttle events yet, but that's uncomfortably close to the SW-slowdown threshold. Standard cooling knobs are absent:\n\n``` bash\n$ nvidia-smi --query-gpu=power.limit,power.max_limit,power.min_limit,fan.speed --format=csv,noheader\n[N/A], [N/A], [N/A], [N/A]\n```\n\nEverything is firmware-managed. `nvidia-smi --help`\n\nstill lists `-lgc`\n\n/ `--lock-gpu-clocks`\n\nthough, and it works — GB10 accepts arbitrary integer MHz values within silicon range even though `--query-supported-clocks=graphics`\n\nreturns `[N/A]`\n\n:\n\n``` bash\n$ sudo nvidia-smi -lgc 1500,2000 -i 0\nGPU clocks set to \"(gpuClkMin 1500, gpuClkMax 2000)\" for GPU 0000000F:01:00.0\nAll done.\n```\n\nThree-band hysteresis, one actuator. Pseudocode:\n\n```\nevery 30s:\n  read temp.gpu\n  if temp >= 78 C:              step_down(150 MHz), bounded by floor\n  elif temp <= 72 C and cool_streak >= 3:  step_up(150 MHz), bounded by ceil\n  else:                         hold\n  cool_streak = cool_streak+1 if temp <= 72 else 0\n```\n\nSetpoints, floor 1800 MHz, ceil 3000 MHz (GB10 max is ~3003). At sustained 83 °C it walks the ceiling down in 150 MHz steps every 30 seconds until temp leaves the hot band, then holds. When load drops it relaxes back to the ceiling on a 3-sample cool streak so a brief dip doesn't clock the whole GPU down for the next hour.\n\nSame Ollama workload throughout, no config changes to the models or the server:\n\n```\ntime      temp   clock  util   action\n07:46:28  82 C   2463   94%    STEP_DOWN\n07:47:28  83 C   2463   94%    STEP_DOWN\n07:47:58  83 C   2463   94%    STEP_DOWN\n07:56:29  76 C   1976   95%    HOLD\n07:57:29  77 C   1976   96%    HOLD\n08:13:44  72 C   2093   94%    HOLD (cool streak 1)\n08:14:14  72 C   2093   94%    HOLD (cool streak 2)\n```\n\n−11 °C sustained. No throttle events across the window. Latency impact is real but bounded — the floor cap of 1800 MHz vs stock 2463 MHz ≈ 27% worst-case clock reduction, and in practice the daemon rides much higher than that.\n\n`sudo nvidia-smi -lgc`\n\nneeds passwordless sudo for the daemon user. I scope it in `/etc/sudoers.d/`\n\nto only `-lgc *`\n\nand `-rgc`\n\n.Wrote it up as a licensed install at [https://thermal.zctechnologies.org](https://thermal.zctechnologies.org) — Go daemon, systemd unit, sudoers scoped, per-node monthly. Comment or DM if you'd rather just have the shell recipe; the algorithm above is the whole thing and I'm happy to answer questions about setpoints or the ExecStopPost=`nvidia-smi -rgc`\n\nteardown so a graceful stop returns your GPU to stock clocks.", "url": "https://wpnews.pro/news/dgx-spark-hitting-83-c-under-sustained-ollama-load-solved-by-clock-locking-via", "canonical_source": "https://dev.to/deal_estate_715bf4569d373/dgx-spark-hitting-83degc-under-sustained-ollama-load-solved-by-clock-locking-via-nvidia-smi-lgc-1pn6", "published_at": "2026-07-01 15:38:44+00:00", "updated_at": "2026-07-01 15:48:59.563575+00:00", "lang": "en", "topics": ["developer-tools", "ai-infrastructure", "machine-learning"], "entities": ["NVIDIA", "DGX Spark", "GB10", "Ollama", "nvidia-smi", "zctechnologies.org"], "alternates": {"html": "https://wpnews.pro/news/dgx-spark-hitting-83-c-under-sustained-ollama-load-solved-by-clock-locking-via", "markdown": "https://wpnews.pro/news/dgx-spark-hitting-83-c-under-sustained-ollama-load-solved-by-clock-locking-via.md", "text": "https://wpnews.pro/news/dgx-spark-hitting-83-c-under-sustained-ollama-load-solved-by-clock-locking-via.txt", "jsonld": "https://wpnews.pro/news/dgx-spark-hitting-83-c-under-sustained-ollama-load-solved-by-clock-locking-via.jsonld"}}