{"slug": "clustering-3-jetson-orin-nano-super", "title": "Clustering 3 Jetson Orin Nano Super", "summary": "A developer built a 3-node cluster using NVIDIA Jetson Orin Nano Super 8GB developer kits, achieving ~759 Mbps per link, peak 58.3°C under full load, and zero throttling at 1728 MHz. The cluster is designed for CUDA-accelerated distributed inference at the edge.", "body_md": "# Clustering 3 Jetson Orin Nano Super\n\nBuild a 3-node Jetson Orin Nano Super 8GB cluster with active cooling. Real numbers: ~759 Mbps per link (gigabit), peak 58.3°C across all 3 nodes under full 18-core sustained load, zero throttling at 1728 MHz throughout.\n\nTested on:NVIDIA Jetson Orin Nano 8GB Developer Kit, Ubuntu 22.04.5 LTS, kernel 5.15.148-tegra, L4T R36.4.7, CUDA 12.6, TP-Link TL-SG108E gigabit switchJetPack:R36 (release), REVISION: 4.7, CUDA 12.6, Driver 540.4.0\n\nYou have three Jetson Orin Nanos. Each has a 6-core ARM Cortex-A78 CPU, an Ampere GPU with CUDA 12.6, and 8GB LPDDR5. This is real GPU compute at the edge. Not a toy..\n\nThis guide walks you through setting up a real 3-node Jetson Orin Nano cluster. Real measured numbers from this exact hardware: **~759 Mbps** per link (gigabit), peak **58.3°C** across all 3 nodes under full 18-core sustained load, zero throttling at 1728 MHz throughout. Stable, thermally managed, and ready for CUDA-accelerated distributed inference.\n\nBy the end, your Nanos will boot, talk to each other with <2ms latency, and be ready for inference workloads across all 3 nodes.\n\n## Why This Setup?\n\n**Real GPU compute on every node.** Each Orin Nano has a 1024-core Ampere GPU and CUDA 12.6 built in. Not CPU-only edge boxes.**Active cooling keeps temps in check.** Under full 18-core sustained load across all 3 nodes, temps plateau at ~60°C, 35°C below the the 95°C throttle threshold.**MAXN_SUPER mode unlocks full performance.** Set`nvpmodel -m 2`\n\nand`jetson_clocks`\n\non each node. Clocks pin at 1728 MHz CPU / 1020 MHz GPU.**Simple network.** One unmanaged switch. Star topology. All nodes at <2ms latency.\n\n## The Hardware\n\n| Component | Model | Notes |\n|---|---|---|\nNodes |\n3x NVIDIA Jetson Orin Nano 8GB Developer Kit | 6x ARM Cortex-A78 @ 1728 MHz, 1024-core Ampere GPU @ 1020 MHz, 8GB LPDDR5 |\nSwitch |\nTP-Link TL-SG108E | 8-port unmanaged gigabit switch, plug & play |\nPower |\n19V DC power adapter (included with dev kit) | Included in the box; use only the provided adapter |\nCables |\nCat 6–Cat 8 Ethernet (×3) | Any Cat 6+ works |\nStorage |\nmicroSD 128GB (×3) or NVMe SSD | JetPack + CUDA libs + models fill space fast; NVMe is faster |\nCooling |\nActive fan (included with dev kit) | Required; do not run cluster workloads fanless |\nCase |\n52Pi Raspberry Pi Cluster Case with 120mm RGB LED 5V Fan | Acrylic cluster enclosure with active 120mm top fan; fits the Orin Nano carrier boards with standard standoff spacing |\n\n## Real Performance Expectations\n\nNumbers measured on this exact hardware:\n\n**Throughput (measured, gigabit)**:- nano-3→nano-1:\n**770 Mbps** - nano-3→nano-2:\n**759 Mbps** - nano-1→nano-2:\n**750 Mbps**.\n\nBoth nodes sending simultaneously from nano-3: 391 + 365 Mbps (nano-3 NIC saturated at ~756 Mbps total).\n\n- nano-3→nano-1:\n**Latency:**<2ms between nodes on the local switch. Measured: 0.5–1.3ms in real ping tests.** Thermals, single node (5-min stress, fan running):**- Idle:\n**~50°C** CPU / 49°C GPU - 1-core load: peak\n**55.2°C**, stabilises 54–55°C, 1728 MHz throughout - All 6 cores: peak\n**60.4°C**, stabilises 59–60°C, 1728 MHz throughout\n\n- Idle:\n**Thermals, full cluster (18 cores across all 3 nodes, 10-min stress, measured on each node):**- nano-1 peak:\n**57.0°C** - nano-2 peak:\n**55.5°C** - nano-3 peak:\n**58.3°C** CPU /**56.8°C** GPU - All nodes held\n**1728 MHz** for the full 10 minutes. Zero throttling. - 95°C throttle threshold gives\n**~37°C headroom** at peak cluster load.\n\n- nano-1 peak:\n**GPU:** 1024-core Ampere @ 1020 MHz (MAXN_SUPER + jetson_clocks). CUDA 12.6, cuDNN 9.x, TensorRT 10.x.\n\n**Perfect for:**\n\n- Distributed GPU inference (split model or batch across nodes)\n- CUDA-accelerated preprocessing / ETL\n- Edge AI that runs entirely on-device\n- Learning how distributed GPU systems work\n\n**Not good for:**\n\n- High-bandwidth inter-node gradient sync at scale since these are linked via 100 Mbps Ethernet. For distributed training, consider a cluster with a 10 Gbps switch or direct NVLink connections.\n\n## Step 1: Assemble the Hardware\n\n**Start with the case.** Follow the 52Pi cluster case user manual to assemble the acrylic layers and mount the standoffs before touching the Jetson boardssince it’s much easier to build the frame empty than to retrofit boards into it later.\n\nOnce the frame is built, seat each Orin Nano carrier board into its layer using the standoffs provided in the 52Pi kit, then connect the 120mm 5V RGB fan header to an available 5V GPIO or fan pin as shown in the manual. The case fan handles ambient airflow across all three nodes; the per-board fan on each Orin Nano still handles direct SoC cooling and must remain connected.\n\nSoC or a system on a chip is where all the CPU, GPU ,memory, and other components are integrated into a single chip. The fan on the Orin Nano carrier board cools this critical component directly, while the case fan circulates air around the whole cluster.\n\nThe Orin Nano Developer Kit ships with a 19V power adapter. Connect it to the barrel jack on the carrier board. The board powers on automatically when connected. No power button is required to be pressed.\n\nMake sure each node’s fan is connected. The developer kit includes an active fan and it is mandatory for sustained workloads; connect it to the fan header on the carrier board before first boot.\n\nConnect each Orin Nano to the TL-SG108E switch via Cat 6 Ethernet:\n\n```\nnano-1 ──── Cat6 ──── TP-Link TL-SG108E port 1\nnano-2 ──── Cat6 ──── TP-Link TL-SG108E port 2\nnano-3 ──── Cat6 ──── TP-Link TL-SG108E port 3\n```\n\nPlug the switch into power. Wait 30 seconds for it to initialise.\n\n## Step 2: Install JetPack & First Boot\n\nFollow the ** NVIDIA Jetson Orin Nano Developer Kit Quick Start Guide** to get each node booted and set up.\n\nTwo things to set consistently across all 3 nodes during setup:\n\n**Username:** same on every node (e.g.`yuvrajsingh`\n\n)**Hostname:**`nano-1`\n\n,`nano-2`\n\n,`nano-3`\n\nAfter the wizard, enable SSH on each node:\n\n```\nsudo systemctl enable --now ssh\n```\n\nThen set max performance mode. Default after install is 25W; switch to MAXN_SUPER:\n\n```\nsudo nvpmodel -m 2\n```\n\nDisconnect the monitor/keyboard. Everything from here is headless.\n\n## Step 4: Find Your IP Addresses\n\nSSH into nano-1:\n\n```\nssh yuvrajsingh@nano-1.local\n```\n\nFind the IP:\n\n```\nhostname -I\n```\n\nOutput (example):\n\n```\n192.168.1.11 172.17.0.1\n```\n\nThe first address is your LAN IP. Ignore `172.17.0.1`\n\n(Docker’s bridge). Repeat for all 3 Nanos.\n\n**Example IPs (yours will differ):**\n\n```\nnano-1: 192.168.1.11\nnano-2: 192.168.1.12\nnano-3: 192.168.1.13\n```\n\nWrite these down. You need them for the next step.\n\n## Step 4b: Assign Private Subnet IPs (Recommended)\n\nWhy?A private subnet gives each Nano a stable, predictable address you control, isolates all cluster traffic to a known range, and makes SSH config, scripts, and inter-node communication unambiguous.\n\nWe’ll add a static secondary IP on `10.10.1.x/24`\n\nto each Nano’s `eth0`\n\nalongside the existing DHCP address.\n\nSSH into each Nano using its DHCP IP:\n\n```\nssh yuvrajsingh@192.168.1.11   # nano-1\n```\n\nCheck the NetworkManager connection name:\n\n```\nnmcli connection show\n```\n\nOutput:\n\n```\nNAME                UUID                                  TYPE      DEVICE\nWired connection 1  a1b2c3d4-...                          ethernet  eth0\n```\n\nAdd the static private IP:\n\n```\n# On nano-1:\nsudo nmcli connection modify \"Wired connection 1\" +ipv4.addresses \"10.10.1.1/24\"\nsudo nmcli connection up \"Wired connection 1\"\n```\n\nVerify:\n\n```\nip addr show eth0\n```\n\nExpected:\n\n```\n2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> ...\n    inet 192.168.1.11/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0\n    inet 10.10.1.1/24 brd 10.10.1.255 scope global noprefixroute eth0\n```\n\nRepeat on all 3 Nanos:\n\n| Node | DHCP IP (Step 4) | Private IP to assign |\n|---|---|---|\n| nano-1 | 192.168.1.11 | `10.10.1.1` |\n| nano-2 | 192.168.1.12 | `10.10.1.2` |\n| nano-3 | 192.168.1.13 | `10.10.1.3` |\n\n**From here on, all steps use 10.10.1.x IPs.**\n\n## Step 5: Test All 3 Nodes Ping Each Other\n\nFrom your laptop:\n\n```\nping 10.10.1.1\nping 10.10.1.2\nping 10.10.1.3\n```\n\n### 5a: SSH into nano-1 and ping the other nodes\n\n```\nssh yuvrajsingh@10.10.1.1\n```\n\nFrom there:\n\n```\nping 10.10.1.2\nping 10.10.1.3\n```\n\nExpected output:\n\n```\nPING 10.10.1.2 (10.10.1.2) 56(84) bytes of data.\n64 bytes from 10.10.1.2: icmp_seq=1 ttl=64 time=0.633 ms\n64 bytes from 10.10.1.2: icmp_seq=2 ttl=64 time=0.477 ms\n```\n\nMeasured on this hardware: **0.5–1.3ms** between nodes. All working? Move on.\n\n## Step 6: Set Up SSH Keys\n\nGenerate a key on your laptop (or any node you want to connect from):\n\n```\nssh-keygen -t ed25519 -f ~/.ssh/nano_cluster -N \"\"\n```\n\nThe `-N \"\"`\n\nskips the passphrase prompt, needed for passwordless SSH to work smoothly. You’ll see:\n\n```\nGenerating public/private ed25519 key pair.\nYour identification has been saved in /home/yuvrajsingh/.ssh/nano_cluster\nYour public key has been saved in /home/yuvrajsingh/.ssh/nano_cluster.pub\nThe key fingerprint is:\nSHA256:ej69Uum+V2f0xsSXr8/hZjqhGwuvEi/UehbtTqc5iSE yuvrajsingh@nano-3\nThe key's randomart image is:\n+--[ED25519 256]--+\n|                 |\n|               ..|\n|        S ..   ++|\n|       E +o. .. B|\n|      o *==ooo.* |\n|       *o=*=B.o+.|\n+----[SHA256]-----+\n```\n\nPre-add the other nodes’ host keys to skip the fingerprint prompt on first connect:\n\n```\nssh-keyscan -H 10.10.1.1 10.10.1.2 >> ~/.ssh/known_hosts\n```\n\nCopy the key to all nodes:\n\n```\nssh-copy-id -i ~/.ssh/nano_cluster.pub yuvrajsingh@10.10.1.1\nssh-copy-id -i ~/.ssh/nano_cluster.pub yuvrajsingh@10.10.1.2\nssh-copy-id -i ~/.ssh/nano_cluster.pub yuvrajsingh@10.10.1.3\n```\n\nEach will ask for the node’s password once, then show:\n\n```\n/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed\n/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys\nyuvrajsingh@10.10.1.1's password:\n\nNumber of key(s) added: 1\n\nNow try logging into the machine, with:   \"ssh 'yuvrajsingh@10.10.1.1'\"\nand check to make sure that only the key(s) you wanted were added.\n```\n\nTest passwordless login:\n\n```\nssh -i ~/.ssh/nano_cluster yuvrajsingh@10.10.1.1\n```\n\nNo password prompt. Type `exit`\n\n.\n\n## Step 7: Create SSH Config\n\nAppend to `~/.ssh/config`\n\n:\n\n```\ncat >> ~/.ssh/config << 'EOF'\n\nHost nano-1\n    HostName 10.10.1.1\n    User yuvrajsingh\n    IdentityFile ~/.ssh/nano_cluster\n    IdentitiesOnly yes\n\nHost nano-2\n    HostName 10.10.1.2\n    User yuvrajsingh\n    IdentityFile ~/.ssh/nano_cluster\n    IdentitiesOnly yes\n\nHost nano-3\n    HostName 10.10.1.3\n    User yuvrajsingh\n    IdentityFile ~/.ssh/nano_cluster\n    IdentitiesOnly yes\nEOF\n```\n\nTest the shortcuts:\n\n```\nssh nano-1\n```\n\nYou’ll see:\n\n```\nWelcome to Ubuntu 22.04.5 LTS (GNU/Linux 5.15.148-tegra aarch64)\n...\nyuvrajsingh@yuvrajsingh-jetson-nano1:~$\n```\n\nNo password. Type `exit`\n\nand repeat for `nano-2`\n\nand `nano-3`\n\n.\n\n## Step 8: Update & Set Performance Mode\n\n### Update the OS\n\n```\nssh nano-1\nsudo apt update && sudo apt upgrade -y\nsudo reboot\n```\n\nWait 30 seconds, repeat for nano-2 and nano-3.\n\n### Set MAXN_SUPER Performance Mode\n\nBy default, Orin Nano may boot in a lower power mode. The available modes on this hardware:\n\n| Mode ID | Name | Notes |\n|---|---|---|\n| 0 | 15W | Moderate performance |\n| 1 | 25W | High performance |\n| 2 | MAXN_SUPER | Maximum (use this) |\n| 3 | 7W | Low power |\n\nSwitch to MAXN_SUPER (mode 2) on each node:\n\n```\nsudo nvpmodel -m 2\n```\n\nVerify:\n\n```\nnvpmodel -q\n```\n\nExpected:\n\n```\nNV Power Mode: MAXN_SUPER\n2\n```\n\nLock all clocks to maximum:\n\n```\nsudo jetson_clocks\n```\n\nVerify:\n\n```\nsudo jetson_clocks --show\n```\n\nGPU should show `MinFreq=1020000000 MaxFreq=1020000000 CurrentFreq=1020000000`\n\n. CPU should be at 1728 MHz.\n\nCheck current CPU freq:\n\n```\ncat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq\n```\n\nExpected: `1728000`\n\n(1728 MHz).\n\n### Install Python and Tools\n\n```\nsudo apt install -y python3-pip python3-venv python3-dev git htop\n```\n\nInstall `jtop`\n\n, interactive monitoring for Jetson (GPU, CPU, temps, CUDA all in one):\n\n```\nsudo pip3 install jetson-stats\nsudo systemctl restart jtop.service\n```\n\nRun with: `sudo jtop`\n\nCreate a venv:\n\n```\npython3 -m venv ~/cluster_env\nsource ~/cluster_env/bin/activate\n```\n\nRepeat on all 3 Nanos.\n\n## Step 9: Test Bandwidth\n\nReal throughput test between nodes using `iperf3`\n\n.\n\nInstall on all Nanos:\n\n```\nsudo apt install -y iperf3\n```\n\nOn nano-1, start the server:\n\n```\nssh nano-1\niperf3 -s\n-----------------------------------------------------------\nServer listening on 5201\n-----------------------------------------------------------\n```\n\nOn nano-3, run the client against nano-1:\n\n```\niperf3 -c 10.10.1.1 -t 20 -f m\n```\n\nActual output from this hardware:\n\n```\nConnecting to host 10.10.1.1, port 5201\n[  5] local 10.10.1.3 port 56890 connected to 10.10.1.1 port 5201\n[ ID] Interval           Transfer     Bitrate         Retr  Cwnd\n[  5]   0.00-20.00  sec  1.79 GBytes   770 Mbits/sec    0             sender\n[  5]   0.00-20.04  sec  1.79 GBytes   766 Mbits/sec                  receiver\n```\n\n**770 Mbps**, real gigabit. Consistent across all three node pairs.\n\n### Bandwidth table (all numbers measured on this hardware)\n\n| Scenario | Throughput | Notes |\n|---|---|---|\n| nano-3 → nano-1 (single link) | 770 Mbps |\nGigabit link |\n| nano-3 → nano-2 (single link) | 759 Mbps |\nConsistent across all pairs |\n| nano-1 → nano-2 (single link) | 750 Mbps |\nAll three pairs within 20 Mbps of each other |\n| nano-3 → nano-1 AND nano-3 → nano-2 simultaneously | 391 + 365 Mbps |\nnano-3 NIC saturated (~756 Mbps total) |\n| All-to-all (nano-1→nano-2, nano-2→nano-3, nano-3→nano-1 simultaneously) | 709 / 693 / 695 Mbps |\nEach node sustaining ~700 Mbps while simultaneously receiving |\n\n### Latency matrix (all 6 pairs pinged concurrently, 20 packets each)\n\n| Pair | Min (ms) | Avg (ms) | Max (ms) | Packet Loss |\n|---|---|---|---|---|\n| nano-3 → nano-1 | 0.442 | 0.828 | 1.264 | 0% |\n| nano-3 → nano-2 | 0.426 | 0.562 | 0.748 | 0% |\n| nano-1 → nano-2 | 0.219 | 0.530 | 1.039 | 0% |\n| nano-1 → nano-3 | 0.298 | 0.670 | 1.160 | 0% |\n| nano-2 → nano-1 | 0.285 | 0.509 | 1.063 | 0% |\n| nano-2 → nano-3 | 0.239 | 0.440 | 0.950 | 0% |\n\n**Sub-millisecond average latency across all pairs under concurrent load. Zero packet loss.**\n\nFor full commands and monitoring procedure, see\n\n[Appendix B: Cluster Test Commands].\n\n## Step 11: Test Temperatures\n\nInstall `stress-ng`\n\n:\n\n```\nsudo apt install -y stress-ng\n```\n\nCheck baseline temperature:\n\n```\nfor zone in cpu-thermal gpu-thermal; do\n  idx=$(grep -rl \"^${zone}$\" /sys/class/thermal/thermal_zone*/type | grep -o '[0-9]*' | tail -1)\n  temp=$(cat /sys/class/thermal/thermal_zone${idx}/temp)\n  printf \"%-20s %.1f°C\\n\" \"$zone\" \"$(echo $temp | awk '{print $1/1000}')\"\ndone\n```\n\nOr use tegrastats for a full readout:\n\n```\nsudo tegrastats --interval 1000\n```\n\nOutput includes `CPU@XX.XC`\n\nand `GPU@XX.XC`\n\nfields.\n\nIdle on this hardware (fan running): **~50°C CPU, ~49°C GPU**.\n\n### Test 1: Single Core at 100% (5 min)\n\n```\nstress-ng --cpu 1 --timeout 300s\n```\n\nMonitor in another terminal:\n\n```\nwatch -n 5 'cat /sys/class/thermal/thermal_zone0/temp | awk \"{printf \\\"CPU: %.1f°C\\n\\\", \\$1/1000}\"'\n```\n\nResults on this hardware:\n\n| Time (s) | CPU (°C) | GPU (°C) | Freq (MHz) |\n|---|---|---|---|\n| 0 | 52.0 | 52.1 | 1728 |\n| 30 | 53.0 | 52.7 | 1728 |\n| 60 | 53.4 | 53.6 | 1728 |\n| 120 | 54.7 | 54.1 | 1728 |\n| 300 | 54.8 | 54.8 | 1728 |\n\n**Result:** Peaks at 55.2°C, stabilises at 54–55°C. Zero throttling. 1728 MHz throughout.\n\nFor commands and monitoring procedure, see\n\n[Appendix A: Single Node Test Commands].\n\n### Test 2: All 6 Cores at 100% (5 min)\n\n```\nstress-ng --cpu 6 --timeout 300s\n```\n\nResults on this hardware:\n\n| Time (s) | CPU (°C) | GPU (°C) | Freq (MHz) |\n|---|---|---|---|\n| 0 | 53.9 | 54.1 | 1728 |\n| 30 | 57.7 | 56.2 | 1728 |\n| 60 | 58.4 | 57.0 | 1728 |\n| 120 | 59.6 | 58.6 | 1728 |\n| 300 | 60.4 | 59.0 | 1728 |\n\n**Result:** Ramp +6.5°C in first 30s, cooling catches up within 60s. Plateaus at 59–60.4°C. Zero throttling.\n\nFor commands and monitoring procedure, see\n\n[Appendix A: Single Node Test Commands].\n\n### Test 3: Full Cluster 18 Cores at 100% (10 min)\n\nRun on all 3 Nanos simultaneously. Use SSH to start stress on the remote nodes while running locally; keep the SSH sessions alive in the foreground so the processes don’t die when the connection closes:\n\n```\nssh nano-1 \"stress-ng --cpu 6 --timeout 600s\" &\nssh nano-2 \"stress-ng --cpu 6 --timeout 600s\" &\nstress-ng --cpu 6 --timeout 600s &\nwait\n```\n\nMeasured from each node simultaneously:\n\n| Min | nano-1 (°C) | nano-2 (°C) | nano-3 CPU (°C) | nano-3 GPU (°C) | Freq (MHz) |\n|---|---|---|---|---|---|\n| 0 (baseline) | 54.5 | 54.0 | 54.4 | 54.4 | 1728 |\n| 1 | 56.2 | 54.6 | 57.2 | 55.8 | 1728 |\n| 2 | 56.4 | 54.9 | 57.5 | 56.4 | 1728 |\n| 3 | 56.5 | 54.7 | 57.8 | 56.1 | 1728 |\n| 5 | 56.8 | 55.3 | 57.8 | 56.3 | 1728 |\n| 7 | 56.9 | 54.9 | 57.9 | 56.5 | 1728 |\n| 10 | 55.2 | 54.3 | 55.0 | 54.7 | cooling |\n\n**Result:** nano-1 peaked at 57.0°C, nano-2 at 55.5°C, nano-3 at 58.3°C CPU / 56.8°C GPU. All 3 nodes held 1728 MHz for the full 10 minutes. Zero throttling. **37°C headroom** before the 95°C Orin throttle threshold.\n\nFor full commands and live monitoring procedure, see\n\n[Appendix B: Cluster Test Commands].\n\nAfter the test, verify clocks are still at max:\n\n```\ncat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq\n```\n\nExpected: `1728000`\n\n```\nnvpmodel -q\n```\n\nExpected: `MAXN_SUPER`\n\n/ `2`\n\n### Test 4: Full Cluster GPU Burn (5 min, all 3 nodes)\n\nInstall [gpu-burn](https://github.com/wilicc/gpu-burn), the standard CUDA GPU stress tool:\n\n```\ngit clone https://github.com/wilicc/gpu-burn\ncd gpu-burn && make\n```\n\nRun on all 3 nodes simultaneously (copy the binary to each node first via `scp`\n\n):\n\n```\nssh nano-1 \"cd /tmp && ./gpu_burn 300\" &\nssh nano-2 \"cd /tmp && ./gpu_burn 300\" &\n./gpu_burn 300 &\nwait\n```\n\nResults on this hardware:\n\n| Time | nano-1 CPU | nano-1 GPU | nano-2 CPU | nano-2 GPU | nano-3 CPU | nano-3 GPU | CPU Freq |\n|---|---|---|---|---|---|---|---|\n| 0 (baseline) | 49.2°C | 51.1°C | 49.2°C | 50.6°C | 48.5°C | 48.3°C | 1728 MHz |\n| 1m | 49.2°C | 50.9°C | 49.1°C | 50.5°C | 48.5°C | 48.5°C | 1114 MHz |\n| 3m | 49.8°C | 51.3°C | 49.2°C | 50.9°C | 48.2°C | 48.4°C | 1114 MHz |\n| 5m | 49.9°C | 51.3°C | 49.1°C | 50.5°C | 48.5°C | 48.9°C | 1190 MHz |\n\n**Result:** GPU temps peaked at **51.3°C**, barely above idle. CPU temps unchanged. CPU frequency dropped from 1728 to ~1114–1190 MHz because the GPU workload draws from the shared power budget in MAXN_SUPER mode. Zero GPU throttling (95°C threshold gives **~44°C headroom**).\n\nFor full commands and monitoring procedure, see\n\n[Appendix B: Cluster Test Commands].\n\n### Thermal Summary\n\n| Scenario | Nodes | Peak CPU | Peak GPU | Throttled? | CPU Clock |\n|---|---|---|---|---|---|\n| Idle | all 3 | ~50°C | ~49°C | No | 1728 MHz |\n| 1 core CPU stress, 5 min | nano-3 only | 55.2°C | 54.8°C | No | 1728 MHz |\n| All 6 cores CPU stress, 5 min | nano-3 only | 60.4°C | 59.1°C | No | 1728 MHz |\n| Full cluster 18 cores CPU stress, 10 min | all 3 | 58.3°C |\n56.8°C |\nNo | 1728 MHz |\n| Full cluster GPU burn (gpu-burn), 5 min | all 3 | 49.9°C | 51.3°C |\nNo | ~1114–1190 MHz |\n\n**Key finding:** The Ampere GPU runs remarkably cool under full compute load. CPU stress is the thermal ceiling for this cluster at 58.3°C, still 37°C from the 95°C throttle threshold.\n\n## Step 12: Test Inter-Node SSH (for distributed jobs)\n\nFrom nano-1, SSH to nano-2 without a password:\n\n```\nssh nano-1\nssh yuvrajsingh@10.10.1.2\n```\n\nIf it asks for a password, set up keys between the Nanos:\n\nOn nano-1:\n\n```\nssh-keygen -t ed25519\n# Press Enter for all defaults\nssh-copy-id -i ~/.ssh/id_ed25519.pub yuvrajsingh@10.10.1.2\nssh-copy-id -i ~/.ssh/id_ed25519.pub yuvrajsingh@10.10.1.3\n```\n\nRetry:\n\n```\nssh yuvrajsingh@10.10.1.2\n```\n\nWorks now. Repeat from each Nano to all others. Distributed frameworks (Ray, MPI) need passwordless SSH between all node pairs.\n\n## Debugging Commands\n\n| What to Check | Command |\n|---|---|\n| All temps (live) | `sudo tegrastats --interval 1000` |\n| CPU temp (quick) | `cat /sys/class/thermal/thermal_zone0/temp \\| awk '{printf \"%.1f°C\\n\", $1/1000}'` |\n| GPU temp | `cat /sys/class/thermal/thermal_zone1/temp \\| awk '{printf \"%.1f°C\\n\", $1/1000}'` |\n| Performance mode | `nvpmodel -q` |\n| All clock settings | `sudo jetson_clocks --show` |\n| CPU frequencies (all cores) | `cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq` |\n| GPU frequency | `cat /sys/devices/platform/gpu.0/devfreq/*/cur_freq` |\n| Full system monitor | `sudo jtop` |\n| Ethernet link speed | `ethtool enP8p1s0 \\| grep Speed` |\n| IP addresses | `hostname -I` |\n| Memory | `free -h` |\n| Disk | `df -h` |\n| Ping another node | `ping 10.10.1.2 -c 3` |\n| Bandwidth test (server) | `iperf3 -s` |\n| Bandwidth test (client) | `iperf3 -c <ip> -t 20 -f m` |\n\n## Common Issues\n\n### “Bandwidth is only ~94 Mbps instead of ~940 Mbps”\n\nThe TL-SG108E has **no physical DIP switches or buttons**. Unlike the LS110P which had an Extend Mode DIP switch on the bottom, the SG108E is managed entirely through its **web interface**. A port manually set to 100 Mbps there will stay at 100 Mbps regardless of cable quality or driver.\n\n**Step 1: Check the LED on the switch port.**\n\nTL-SG108E port LEDs show negotiated speed:\n\n**Green**= 1000 Mbps ✓** Yellow/Amber**= 10/100 Mbps ← this is what you have\n\n**Step 2: Access the switch web UI.**\n\nConnect a laptop directly to the switch (or any device on the same subnet as the switch’s management IP). Factory default is `192.168.0.1`\n\n:\n\n```\nBrowser: http://192.168.0.1\nLogin: admin / admin  (factory default)\n```\n\n**Step 3: Fix the port speed.**\n\nGo to **Switching → Port Config**. Find the port your Nano is on. If Speed/Duplex is set to `100M Full`\n\nor `100M Half`\n\n, change it to **Auto** or **1000M Full**. Click **Apply**.\n\nThe port re-links within a few seconds. LED turns green. iperf3 will now show ~940 Mbps.\n\n### “nvpmodel -m 2 doesn’t persist after reboot”\n\n- JetPack can reset to a default lower power mode on reboot.\n- Fix: add to\n`/etc/rc.local`\n\n:\n\n```\n/usr/bin/nvpmodel -m 2\n/usr/bin/jetson_clocks\n```\n\n### “Can’t SSH from nano-1 to nano-2”\n\n- Did you set up inter-node SSH keys? Run\n`ssh-keygen -t ed25519`\n\non nano-1, then`ssh-copy-id`\n\n. - Check\n`cat ~/.ssh/authorized_keys`\n\non nano-2 to confirm nano-1’s key is present. - Verify\n`10.10.1.x`\n\naddresses are live on both nodes:`ip addr show eth0`\n\n.\n\n## My Cluster Layout\n\n| Node | Hostname | IP | Specs |\n|---|---|---|---|\n| nano-1 | yuvrajsingh-jetson-nano1 | 10.10.1.1 | Orin Nano 8GB, MAXN_SUPER, fan connected |\n| nano-2 | yuvrajsingh-jetson-nano2 | 10.10.1.2 | Orin Nano 8GB, MAXN_SUPER, fan connected |\n| nano-3 | yuvrajsingh-jetson-nano3 | 10.10.1.3 | Orin Nano 8GB, MAXN_SUPER, fan connected |\n\n**Network:** TP-Link TL-SG108E (gigabit switch), Cat 6–8 Ethernet, ports 1–3. Negotiating at 100 Mbps (r8168 driver quirk on L4T R36).\n\nYou’re ready. Build something.\n\n**Built for smolcluster.** Distributed training and inference, from scratch, on your own hardware.\n\n## Final Checklist\n\n- All 3 Nanos boot successfully\n- Fan connected and spinning on each node\n- JetPack R36 (Ubuntu 22.04) flashed on all 3 microSD cards\n- OOB setup complete on all 3 Nanos (hostname, username, password)\n- All Nanos reachable via ping (<2ms latency)\n- Private subnet\n`10.10.1.x`\n\nconfigured on all nodes - SSH keys set up from laptop to all Nanos\n- Passwordless SSH works to all Nanos\n- SSH config created on your laptop\n- MAXN_SUPER mode active (\n`nvpmodel -m 2`\n\n+`jetson_clocks`\n\n) - Python 3 and tools installed on all Nanos\n- jtop installed and showing GPU/CPU readout\n- Bandwidth test shows ~94 Mbps per link (disable Extend Mode if stuck at 9.5 Mbps)\n- Temperatures stable at <65°C under full load with fan running\n- nvpmodel + jetson_clocks added to\n`/etc/rc.local`\n\nfor persistence\n\n## What’s Next?\n\nYou have a working 3-node Jetson Orin Nano cluster, each with CUDA 12.6, cuDNN, and TensorRT ready to go. Ideas:\n\n**Distributed GPU inference:** Split a model across nodes, or assign each node a batch partition. Ray Serve or a custom split-inference script.**TensorRT optimization:** Convert ONNX models to TensorRT engines on-device, 2–4× inference speedup over vanilla PyTorch.**Distributed preprocessing:** CUDA-accelerated ETL across 3 nodes in parallel.**Edge monitoring:** Run a quantized LLM locally across the cluster; this is what smolcluster is for.\n\n## Appendix A: Single Node Test Commands\n\nThese are the exact commands used to produce the single-node results in Step 11.\n\n### Baseline temperature\n\n```\n# Quick per-zone readout\nfor zone in cpu-thermal gpu-thermal; do\n  idx=$(grep -rl \"^${zone}$\" /sys/class/thermal/thermal_zone*/type | grep -o '[0-9]*' | tail -1)\n  temp=$(cat /sys/class/thermal/thermal_zone${idx}/temp)\n  printf \"%-20s %.1f°C\\n\" \"$zone\" \"$(echo $temp | awk '{print $1/1000}')\"\ndone\n\n# Or live stream via tegrastats\nsudo tegrastats --interval 1000\n```\n\n### Single core stress (5 min) with live temp monitoring\n\n```\n# Terminal 1: run stress\nstress-ng --cpu 1 --timeout 300s\n\n# Terminal 2: watch CPU temp every 5s\nwatch -n 5 'cat /sys/class/thermal/thermal_zone0/temp | awk \"{printf \\\"CPU: %.1f°C\\n\\\", \\$1/1000}\"'\n```\n\n### All 6 cores stress (5 min) with live temp monitoring\n\n```\n# Terminal 1\nstress-ng --cpu 6 --timeout 300s\n\n# Terminal 2\nwatch -n 5 'cat /sys/class/thermal/thermal_zone0/temp | awk \"{printf \\\"CPU: %.1f°C\\n\\\", \\$1/1000}\"'\n```\n\n### Single node bandwidth (iperf3)\n\n```\n# On the server node\niperf3 -s\n\n# On the client node\niperf3 -c <server-ip> -t 20 -f m\n```\n\n### Single node GPU burn\n\n```\n# Build once\ngit clone https://github.com/wilicc/gpu-burn\ncd gpu-burn && make\n\n# Run (seconds as argument)\n./gpu_burn 300\n```\n\n## Appendix B: Cluster Test Commands\n\nThese are the exact commands used to produce all cluster-wide results in Steps 9 and 11.\n\n### All node pairs bandwidth (iperf3)\n\nStart iperf3 server on each target node, then run client from another:\n\n```\n# Start servers on nano-1 and nano-2\nssh nano-1 \"iperf3 -s -D\"\nssh nano-2 \"iperf3 -s -D\"\n\n# Test each pair individually (20s each)\niperf3 -c 10.10.1.1 -t 20 -f m   # nano-3 -> nano-1\niperf3 -c 10.10.1.2 -t 20 -f m   # nano-3 -> nano-2\nssh nano-1 \"iperf3 -c 10.10.1.2 -t 20 -f m\"  # nano-1 -> nano-2\n\n# Two links from nano-3 simultaneously\niperf3 -c 10.10.1.1 -t 20 -f m &\niperf3 -c 10.10.1.2 -t 20 -f m &\nwait\n```\n\n### All-to-all bandwidth (all 3 nodes sending simultaneously)\n\n```\n# Start iperf3 servers on all 3 nodes\niperf3 -s -D\nssh nano-1 \"iperf3 -s -D\"\nssh nano-2 \"iperf3 -s -D\"\nsleep 2\n\n# Each node sends to a different node at the same time\nssh nano-1 \"iperf3 -c 10.10.1.2 -t 20 -f m 2>&1 | tail -3\" &\nssh nano-2 \"iperf3 -c 10.10.1.3 -t 20 -f m 2>&1 | tail -3\" &\niperf3 -c 10.10.1.1 -t 20 -f m 2>&1 | tail -3 &\nwait\n```\n\n### Latency matrix (all 6 directional pairs, concurrent)\n\n``` php\nping -c 20 -q 10.10.1.1 2>&1 | tail -2 &   # nano-3 -> nano-1\nping -c 20 -q 10.10.1.2 2>&1 | tail -2 &   # nano-3 -> nano-2\nssh nano-1 \"ping -c 20 -q 10.10.1.2 2>&1 | tail -2\" &  # nano-1 -> nano-2\nssh nano-1 \"ping -c 20 -q 10.10.1.3 2>&1 | tail -2\" &  # nano-1 -> nano-3\nssh nano-2 \"ping -c 20 -q 10.10.1.1 2>&1 | tail -2\" &  # nano-2 -> nano-1\nssh nano-2 \"ping -c 20 -q 10.10.1.3 2>&1 | tail -2\" &  # nano-2 -> nano-3\nwait\n```\n\n### Full cluster CPU stress (18 cores, all 3 nodes)\n\nKeep SSH sessions alive in the foreground so stress processes don’t die when the connection closes:\n\n```\nssh nano-1 \"stress-ng --cpu 6 --timeout 600s\" &\nssh nano-2 \"stress-ng --cpu 6 --timeout 600s\" &\nstress-ng --cpu 6 --timeout 600s &\nwait\n```\n\nMonitor temps from each node while running (separate terminal):\n\n```\nwatch -n 60 '\n  echo \"nano-1: $(ssh nano-1 \"cat /sys/class/thermal/thermal_zone0/temp\" | awk \"{printf \\\"%.1f\\\", \\$1/1000}\")°C\"\n  echo \"nano-2: $(ssh nano-2 \"cat /sys/class/thermal/thermal_zone0/temp\" | awk \"{printf \\\"%.1f\\\", \\$1/1000}\")°C\"\n  echo \"nano-3: $(cat /sys/class/thermal/thermal_zone0/temp | awk \"{printf \\\"%.1f\\\", \\$1/1000}\")°C\"\n'\n```\n\n### Full cluster GPU burn (all 3 nodes)\n\nBuild gpu-burn on one node and copy to the others:\n\n```\n# Build on nano-3\ngit clone https://github.com/wilicc/gpu-burn\ncd gpu-burn && make\n\n# Copy to other nodes\nscp gpu_burn compare.fatbin nano-1:/tmp/\nscp gpu_burn compare.fatbin nano-2:/tmp/\n\n# Run on all 3 simultaneously (300 = seconds)\nssh nano-1 \"cd /tmp && ./gpu_burn 300\" &\nssh nano-2 \"cd /tmp && ./gpu_burn 300\" &\n./gpu_burn 300 &\nwait\n```\n\nMonitor GPU temps during the run:\n\n```\nwatch -n 30 'sudo tegrastats --interval 100 | head -1'\n```\n\n", "url": "https://wpnews.pro/news/clustering-3-jetson-orin-nano-super", "canonical_source": "https://www.smolhub.com/posts/jetson-orin-nano-cluster-setup-guide/", "published_at": "2026-06-05 07:00:00+00:00", "updated_at": "2026-06-28 10:40:49.292164+00:00", "lang": "en", "topics": ["ai-infrastructure"], "entities": ["NVIDIA", "Jetson Orin Nano", "CUDA", "TP-Link TL-SG108E", "Ubuntu", "JetPack", "TensorRT", "cuDNN"], "alternates": {"html": "https://wpnews.pro/news/clustering-3-jetson-orin-nano-super", "markdown": "https://wpnews.pro/news/clustering-3-jetson-orin-nano-super.md", "text": "https://wpnews.pro/news/clustering-3-jetson-orin-nano-super.txt", "jsonld": "https://wpnews.pro/news/clustering-3-jetson-orin-nano-super.jsonld"}}