nvidia-smi cheat sheet The **nvidia-smi** (NVIDIA System Management Interface) is a command-line tool for monitoring, managing, and diagnosing NVIDIA GPU devices, providing data on performance, temperature, utilization, power, and memory. It supports real-time monitoring with options like `-l` for periodic updates, `dmon` for device monitoring, and `--query-gpu` for exporting detailed metrics in CSV format for scripting and automation. The tool also allows administrative control, including setting power limits, locking clock speeds, enabling persistence mode, and terminating GPU processes. nvidia-smi NVIDIA System Management Interface is a command-line tool that provides monitoring, management, and diagnostic information for NVIDIA GPU devices. It communicates directly with the NVIDIA driver and GPU, and can: - Monitor GPU performance, temperature, and utilization - Manage power, clock speeds, and ECC - Control persistence mode and compute modes - Query detailed metrics for automation and monitoring nvidia-smi Shows a summary table with: - GPU index, name, and UUID - Driver & CUDA versions - GPU & memory utilization - Power consumption and temperature - Active processes using the GPU nvidia-smi -l 5 nvidia-smi -lms 500 nvidia-smi --filename=/var/log/gpu.log -l 5 nvidia-smi dmon Example: gpu pwr temp sm mem enc dec mclk pclk 0 85 64 23 5 0 0 405 1110 nvidia-smi --query-gpu=index,name,uuid,temperature.gpu,utilization.gpu,memory.used,memory.total --format=csv Output: index, name, uuid, temperature.gpu, utilization.gpu % , memory.used MiB , memory.total MiB 0, NVIDIA RTX A6000, GPU-02afcc1a-…, 58, 72 %, 13456 MiB, 49152 MiB nvidia-smi --query-gpu=memory.used,memory.total --format=csv nvidia-smi --query-gpu=temperature.gpu,power.draw --format=csv,noheader,nounits nvidia-smi --query-gpu=name --format=csv,noheader nvidia-smi pmon -c 1 nvidia-smi pmon Example output: gpu pid type sm mem enc dec command 0 3024 C 23 5 0 0 python3 Terminate a process: sudo kill -9