Clustering 3 Jetson Orin Nano Super

A developer built a 3-node cluster using NVIDIA Jetson Orin Nano Super 8GB developer kits, achieving ~759 Mbps per link, peak 58.3°C under full load, and zero throttling at 1728 MHz. The cluster is designed for CUDA-accelerated distributed inference at the edge.

Clustering 3 Jetson Orin Nano Super Build a 3-node Jetson Orin Nano Super 8GB cluster with active cooling. Real numbers: ~759 Mbps per link gigabit , peak 58.3°C across all 3 nodes under full 18-core sustained load, zero throttling at 1728 MHz throughout. Tested on:NVIDIA Jetson Orin Nano 8GB Developer Kit, Ubuntu 22.04.5 LTS, kernel 5.15.148-tegra, L4T R36.4.7, CUDA 12.6, TP-Link TL-SG108E gigabit switchJetPack:R36 release , REVISION: 4.7, CUDA 12.6, Driver 540.4.0 You have three Jetson Orin Nanos. Each has a 6-core ARM Cortex-A78 CPU, an Ampere GPU with CUDA 12.6, and 8GB LPDDR5. This is real GPU compute at the edge. Not a toy.. This guide walks you through setting up a real 3-node Jetson Orin Nano cluster. Real measured numbers from this exact hardware: ~759 Mbps per link gigabit , peak 58.3°C across all 3 nodes under full 18-core sustained load, zero throttling at 1728 MHz throughout. Stable, thermally managed, and ready for CUDA-accelerated distributed inference. By the end, your Nanos will boot, talk to each other with <2ms latency, and be ready for inference workloads across all 3 nodes. Why This Setup? Real GPU compute on every node. Each Orin Nano has a 1024-core Ampere GPU and CUDA 12.6 built in. Not CPU-only edge boxes. Active cooling keeps temps in check. Under full 18-core sustained load across all 3 nodes, temps plateau at ~60°C, 35°C below the the 95°C throttle threshold. MAXN SUPER mode unlocks full performance. Set nvpmodel -m 2 and jetson clocks on each node. Clocks pin at 1728 MHz CPU / 1020 MHz GPU. Simple network. One unmanaged switch. Star topology. All nodes at <2ms latency. The Hardware | Component | Model | Notes | |---|---|---| Nodes | 3x NVIDIA Jetson Orin Nano 8GB Developer Kit | 6x ARM Cortex-A78 @ 1728 MHz, 1024-core Ampere GPU @ 1020 MHz, 8GB LPDDR5 | Switch | TP-Link TL-SG108E | 8-port unmanaged gigabit switch, plug & play | Power | 19V DC power adapter included with dev kit | Included in the box; use only the provided adapter | Cables | Cat 6–Cat 8 Ethernet ×3 | Any Cat 6+ works | Storage | microSD 128GB ×3 or NVMe SSD | JetPack + CUDA libs + models fill space fast; NVMe is faster | Cooling | Active fan included with dev kit | Required; do not run cluster workloads fanless | Case | 52Pi Raspberry Pi Cluster Case with 120mm RGB LED 5V Fan | Acrylic cluster enclosure with active 120mm top fan; fits the Orin Nano carrier boards with standard standoff spacing | Real Performance Expectations Numbers measured on this exact hardware: Throughput measured, gigabit :- nano-3→nano-1: 770 Mbps - nano-3→nano-2: 759 Mbps - nano-1→nano-2: 750 Mbps . Both nodes sending simultaneously from nano-3: 391 + 365 Mbps nano-3 NIC saturated at ~756 Mbps total . - nano-3→nano-1: Latency: <2ms between nodes on the local switch. Measured: 0.5–1.3ms in real ping tests. Thermals, single node 5-min stress, fan running : - Idle: ~50°C CPU / 49°C GPU - 1-core load: peak 55.2°C , stabilises 54–55°C, 1728 MHz throughout - All 6 cores: peak 60.4°C , stabilises 59–60°C, 1728 MHz throughout - Idle: Thermals, full cluster 18 cores across all 3 nodes, 10-min stress, measured on each node : - nano-1 peak: 57.0°C - nano-2 peak: 55.5°C - nano-3 peak: 58.3°C CPU / 56.8°C GPU - All nodes held 1728 MHz for the full 10 minutes. Zero throttling. - 95°C throttle threshold gives ~37°C headroom at peak cluster load. - nano-1 peak: GPU: 1024-core Ampere @ 1020 MHz MAXN SUPER + jetson clocks . CUDA 12.6, cuDNN 9.x, TensorRT 10.x. Perfect for: - Distributed GPU inference split model or batch across nodes - CUDA-accelerated preprocessing / ETL - Edge AI that runs entirely on-device - Learning how distributed GPU systems work Not good for: - High-bandwidth inter-node gradient sync at scale since these are linked via 100 Mbps Ethernet. For distributed training, consider a cluster with a 10 Gbps switch or direct NVLink connections. Step 1: Assemble the Hardware Start with the case. Follow the 52Pi cluster case user manual to assemble the acrylic layers and mount the standoffs before touching the Jetson boardssince it’s much easier to build the frame empty than to retrofit boards into it later. Once the frame is built, seat each Orin Nano carrier board into its layer using the standoffs provided in the 52Pi kit, then connect the 120mm 5V RGB fan header to an available 5V GPIO or fan pin as shown in the manual. The case fan handles ambient airflow across all three nodes; the per-board fan on each Orin Nano still handles direct SoC cooling and must remain connected. SoC or a system on a chip is where all the CPU, GPU ,memory, and other components are integrated into a single chip. The fan on the Orin Nano carrier board cools this critical component directly, while the case fan circulates air around the whole cluster. The Orin Nano Developer Kit ships with a 19V power adapter. Connect it to the barrel jack on the carrier board. The board powers on automatically when connected. No power button is required to be pressed. Make sure each node’s fan is connected. The developer kit includes an active fan and it is mandatory for sustained workloads; connect it to the fan header on the carrier board before first boot. Connect each Orin Nano to the TL-SG108E switch via Cat 6 Ethernet: nano-1 ──── Cat6 ──── TP-Link TL-SG108E port 1 nano-2 ──── Cat6 ──── TP-Link TL-SG108E port 2 nano-3 ──── Cat6 ──── TP-Link TL-SG108E port 3 Plug the switch into power. Wait 30 seconds for it to initialise. Step 2: Install JetPack & First Boot Follow the NVIDIA Jetson Orin Nano Developer Kit Quick Start Guide to get each node booted and set up. Two things to set consistently across all 3 nodes during setup: Username: same on every node e.g. yuvrajsingh Hostname: nano-1 , nano-2 , nano-3 After the wizard, enable SSH on each node: sudo systemctl enable --now ssh Then set max performance mode. Default after install is 25W; switch to MAXN SUPER: sudo nvpmodel -m 2 Disconnect the monitor/keyboard. Everything from here is headless. Step 4: Find Your IP Addresses SSH into nano-1: ssh yuvrajsingh@nano-1.local Find the IP: hostname -I Output example : 192.168.1.11 172.17.0.1 The first address is your LAN IP. Ignore 172.17.0.1 Docker’s bridge . Repeat for all 3 Nanos. Example IPs yours will differ : nano-1: 192.168.1.11 nano-2: 192.168.1.12 nano-3: 192.168.1.13 Write these down. You need them for the next step. Step 4b: Assign Private Subnet IPs Recommended Why?A private subnet gives each Nano a stable, predictable address you control, isolates all cluster traffic to a known range, and makes SSH config, scripts, and inter-node communication unambiguous. We’ll add a static secondary IP on 10.10.1.x/24 to each Nano’s eth0 alongside the existing DHCP address. SSH into each Nano using its DHCP IP: ssh yuvrajsingh@192.168.1.11 nano-1 Check the NetworkManager connection name: nmcli connection show Output: NAME UUID TYPE DEVICE Wired connection 1 a1b2c3d4-... ethernet eth0 Add the static private IP: On nano-1: sudo nmcli connection modify "Wired connection 1" +ipv4.addresses "10.10.1.1/24" sudo nmcli connection up "Wired connection 1" Verify: ip addr show eth0 Expected: 2: eth0: <BROADCAST,MULTICAST,UP,LOWER UP ... inet 192.168.1.11/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0 inet 10.10.1.1/24 brd 10.10.1.255 scope global noprefixroute eth0 Repeat on all 3 Nanos: | Node | DHCP IP Step 4 | Private IP to assign | |---|---|---| | nano-1 | 192.168.1.11 | 10.10.1.1 | | nano-2 | 192.168.1.12 | 10.10.1.2 | | nano-3 | 192.168.1.13 | 10.10.1.3 | From here on, all steps use 10.10.1.x IPs. Step 5: Test All 3 Nodes Ping Each Other From your laptop: ping 10.10.1.1 ping 10.10.1.2 ping 10.10.1.3 5a: SSH into nano-1 and ping the other nodes ssh yuvrajsingh@10.10.1.1 From there: ping 10.10.1.2 ping 10.10.1.3 Expected output: PING 10.10.1.2 10.10.1.2 56 84 bytes of data. 64 bytes from 10.10.1.2: icmp seq=1 ttl=64 time=0.633 ms 64 bytes from 10.10.1.2: icmp seq=2 ttl=64 time=0.477 ms Measured on this hardware: 0.5–1.3ms between nodes. All working? Move on. Step 6: Set Up SSH Keys Generate a key on your laptop or any node you want to connect from : ssh-keygen -t ed25519 -f ~/.ssh/nano cluster -N "" The -N "" skips the passphrase prompt, needed for passwordless SSH to work smoothly. You’ll see: Generating public/private ed25519 key pair. Your identification has been saved in /home/yuvrajsingh/.ssh/nano cluster Your public key has been saved in /home/yuvrajsingh/.ssh/nano cluster.pub The key fingerprint is: SHA256:ej69Uum+V2f0xsSXr8/hZjqhGwuvEi/UehbtTqc5iSE yuvrajsingh@nano-3 The key's randomart image is: +-- ED25519 256 --+ | | | ..| | S .. ++| | E +o. .. B| | o ==ooo. | | o= =B.o+.| +---- SHA256 -----+ Pre-add the other nodes’ host keys to skip the fingerprint prompt on first connect: ssh-keyscan -H 10.10.1.1 10.10.1.2 ~/.ssh/known hosts Copy the key to all nodes: ssh-copy-id -i ~/.ssh/nano cluster.pub yuvrajsingh@10.10.1.1 ssh-copy-id -i ~/.ssh/nano cluster.pub yuvrajsingh@10.10.1.2 ssh-copy-id -i ~/.ssh/nano cluster.pub yuvrajsingh@10.10.1.3 Each will ask for the node’s password once, then show: /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key s , to filter out any that are already installed /usr/bin/ssh-copy-id: INFO: 1 key s remain to be installed -- if you are prompted now it is to install the new keys yuvrajsingh@10.10.1.1's password: Number of key s added: 1 Now try logging into the machine, with: "ssh 'yuvrajsingh@10.10.1.1'" and check to make sure that only the key s you wanted were added. Test passwordless login: ssh -i ~/.ssh/nano cluster yuvrajsingh@10.10.1.1 No password prompt. Type exit . Step 7: Create SSH Config Append to ~/.ssh/config : cat ~/.ssh/config << 'EOF' Host nano-1 HostName 10.10.1.1 User yuvrajsingh IdentityFile ~/.ssh/nano cluster IdentitiesOnly yes Host nano-2 HostName 10.10.1.2 User yuvrajsingh IdentityFile ~/.ssh/nano cluster IdentitiesOnly yes Host nano-3 HostName 10.10.1.3 User yuvrajsingh IdentityFile ~/.ssh/nano cluster IdentitiesOnly yes EOF Test the shortcuts: ssh nano-1 You’ll see: Welcome to Ubuntu 22.04.5 LTS GNU/Linux 5.15.148-tegra aarch64 ... yuvrajsingh@yuvrajsingh-jetson-nano1:~$ No password. Type exit and repeat for nano-2 and nano-3 . Step 8: Update & Set Performance Mode Update the OS ssh nano-1 sudo apt update && sudo apt upgrade -y sudo reboot Wait 30 seconds, repeat for nano-2 and nano-3. Set MAXN SUPER Performance Mode By default, Orin Nano may boot in a lower power mode. The available modes on this hardware: | Mode ID | Name | Notes | |---|---|---| | 0 | 15W | Moderate performance | | 1 | 25W | High performance | | 2 | MAXN SUPER | Maximum use this | | 3 | 7W | Low power | Switch to MAXN SUPER mode 2 on each node: sudo nvpmodel -m 2 Verify: nvpmodel -q Expected: NV Power Mode: MAXN SUPER 2 Lock all clocks to maximum: sudo jetson clocks Verify: sudo jetson clocks --show GPU should show MinFreq=1020000000 MaxFreq=1020000000 CurrentFreq=1020000000 . CPU should be at 1728 MHz. Check current CPU freq: cat /sys/devices/system/cpu/cpu0/cpufreq/scaling cur freq Expected: 1728000 1728 MHz . Install Python and Tools sudo apt install -y python3-pip python3-venv python3-dev git htop Install jtop , interactive monitoring for Jetson GPU, CPU, temps, CUDA all in one : sudo pip3 install jetson-stats sudo systemctl restart jtop.service Run with: sudo jtop Create a venv: python3 -m venv ~/cluster env source ~/cluster env/bin/activate Repeat on all 3 Nanos. Step 9: Test Bandwidth Real throughput test between nodes using iperf3 . Install on all Nanos: sudo apt install -y iperf3 On nano-1, start the server: ssh nano-1 iperf3 -s ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- On nano-3, run the client against nano-1: iperf3 -c 10.10.1.1 -t 20 -f m Actual output from this hardware: Connecting to host 10.10.1.1, port 5201 5 local 10.10.1.3 port 56890 connected to 10.10.1.1 port 5201 ID Interval Transfer Bitrate Retr Cwnd 5 0.00-20.00 sec 1.79 GBytes 770 Mbits/sec 0 sender 5 0.00-20.04 sec 1.79 GBytes 766 Mbits/sec receiver 770 Mbps , real gigabit. Consistent across all three node pairs. Bandwidth table all numbers measured on this hardware | Scenario | Throughput | Notes | |---|---|---| | nano-3 → nano-1 single link | 770 Mbps | Gigabit link | | nano-3 → nano-2 single link | 759 Mbps | Consistent across all pairs | | nano-1 → nano-2 single link | 750 Mbps | All three pairs within 20 Mbps of each other | | nano-3 → nano-1 AND nano-3 → nano-2 simultaneously | 391 + 365 Mbps | nano-3 NIC saturated ~756 Mbps total | | All-to-all nano-1→nano-2, nano-2→nano-3, nano-3→nano-1 simultaneously | 709 / 693 / 695 Mbps | Each node sustaining ~700 Mbps while simultaneously receiving | Latency matrix all 6 pairs pinged concurrently, 20 packets each | Pair | Min ms | Avg ms | Max ms | Packet Loss | |---|---|---|---|---| | nano-3 → nano-1 | 0.442 | 0.828 | 1.264 | 0% | | nano-3 → nano-2 | 0.426 | 0.562 | 0.748 | 0% | | nano-1 → nano-2 | 0.219 | 0.530 | 1.039 | 0% | | nano-1 → nano-3 | 0.298 | 0.670 | 1.160 | 0% | | nano-2 → nano-1 | 0.285 | 0.509 | 1.063 | 0% | | nano-2 → nano-3 | 0.239 | 0.440 | 0.950 | 0% | Sub-millisecond average latency across all pairs under concurrent load. Zero packet loss. For full commands and monitoring procedure, see Appendix B: Cluster Test Commands . Step 11: Test Temperatures Install stress-ng : sudo apt install -y stress-ng Check baseline temperature: for zone in cpu-thermal gpu-thermal; do idx=$ grep -rl "^${zone}$" /sys/class/thermal/thermal zone /type | grep -o ' 0-9 ' | tail -1 temp=$ cat /sys/class/thermal/thermal zone${idx}/temp printf "%-20s %.1f°C\n" "$zone" "$ echo $temp | awk '{print $1/1000}' " done Or use tegrastats for a full readout: sudo tegrastats --interval 1000 Output includes CPU@XX.XC and GPU@XX.XC fields. Idle on this hardware fan running : ~50°C CPU, ~49°C GPU . Test 1: Single Core at 100% 5 min stress-ng --cpu 1 --timeout 300s Monitor in another terminal: watch -n 5 'cat /sys/class/thermal/thermal zone0/temp | awk "{printf \"CPU: %.1f°C\n\", \$1/1000}"' Results on this hardware: | Time s | CPU °C | GPU °C | Freq MHz | |---|---|---|---| | 0 | 52.0 | 52.1 | 1728 | | 30 | 53.0 | 52.7 | 1728 | | 60 | 53.4 | 53.6 | 1728 | | 120 | 54.7 | 54.1 | 1728 | | 300 | 54.8 | 54.8 | 1728 | Result: Peaks at 55.2°C, stabilises at 54–55°C. Zero throttling. 1728 MHz throughout. For commands and monitoring procedure, see Appendix A: Single Node Test Commands . Test 2: All 6 Cores at 100% 5 min stress-ng --cpu 6 --timeout 300s Results on this hardware: | Time s | CPU °C | GPU °C | Freq MHz | |---|---|---|---| | 0 | 53.9 | 54.1 | 1728 | | 30 | 57.7 | 56.2 | 1728 | | 60 | 58.4 | 57.0 | 1728 | | 120 | 59.6 | 58.6 | 1728 | | 300 | 60.4 | 59.0 | 1728 | Result: Ramp +6.5°C in first 30s, cooling catches up within 60s. Plateaus at 59–60.4°C. Zero throttling. For commands and monitoring procedure, see Appendix A: Single Node Test Commands . Test 3: Full Cluster 18 Cores at 100% 10 min Run on all 3 Nanos simultaneously. Use SSH to start stress on the remote nodes while running locally; keep the SSH sessions alive in the foreground so the processes don’t die when the connection closes: ssh nano-1 "stress-ng --cpu 6 --timeout 600s" & ssh nano-2 "stress-ng --cpu 6 --timeout 600s" & stress-ng --cpu 6 --timeout 600s & wait Measured from each node simultaneously: | Min | nano-1 °C | nano-2 °C | nano-3 CPU °C | nano-3 GPU °C | Freq MHz | |---|---|---|---|---|---| | 0 baseline | 54.5 | 54.0 | 54.4 | 54.4 | 1728 | | 1 | 56.2 | 54.6 | 57.2 | 55.8 | 1728 | | 2 | 56.4 | 54.9 | 57.5 | 56.4 | 1728 | | 3 | 56.5 | 54.7 | 57.8 | 56.1 | 1728 | | 5 | 56.8 | 55.3 | 57.8 | 56.3 | 1728 | | 7 | 56.9 | 54.9 | 57.9 | 56.5 | 1728 | | 10 | 55.2 | 54.3 | 55.0 | 54.7 | cooling | Result: nano-1 peaked at 57.0°C, nano-2 at 55.5°C, nano-3 at 58.3°C CPU / 56.8°C GPU. All 3 nodes held 1728 MHz for the full 10 minutes. Zero throttling. 37°C headroom before the 95°C Orin throttle threshold. For full commands and live monitoring procedure, see Appendix B: Cluster Test Commands . After the test, verify clocks are still at max: cat /sys/devices/system/cpu/cpu0/cpufreq/scaling cur freq Expected: 1728000 nvpmodel -q Expected: MAXN SUPER / 2 Test 4: Full Cluster GPU Burn 5 min, all 3 nodes Install gpu-burn https://github.com/wilicc/gpu-burn , the standard CUDA GPU stress tool: git clone https://github.com/wilicc/gpu-burn cd gpu-burn && make Run on all 3 nodes simultaneously copy the binary to each node first via scp : ssh nano-1 "cd /tmp && ./gpu burn 300" & ssh nano-2 "cd /tmp && ./gpu burn 300" & ./gpu burn 300 & wait Results on this hardware: | Time | nano-1 CPU | nano-1 GPU | nano-2 CPU | nano-2 GPU | nano-3 CPU | nano-3 GPU | CPU Freq | |---|---|---|---|---|---|---|---| | 0 baseline | 49.2°C | 51.1°C | 49.2°C | 50.6°C | 48.5°C | 48.3°C | 1728 MHz | | 1m | 49.2°C | 50.9°C | 49.1°C | 50.5°C | 48.5°C | 48.5°C | 1114 MHz | | 3m | 49.8°C | 51.3°C | 49.2°C | 50.9°C | 48.2°C | 48.4°C | 1114 MHz | | 5m | 49.9°C | 51.3°C | 49.1°C | 50.5°C | 48.5°C | 48.9°C | 1190 MHz | Result: GPU temps peaked at 51.3°C , barely above idle. CPU temps unchanged. CPU frequency dropped from 1728 to ~1114–1190 MHz because the GPU workload draws from the shared power budget in MAXN SUPER mode. Zero GPU throttling 95°C threshold gives ~44°C headroom . For full commands and monitoring procedure, see Appendix B: Cluster Test Commands . Thermal Summary | Scenario | Nodes | Peak CPU | Peak GPU | Throttled? | CPU Clock | |---|---|---|---|---|---| | Idle | all 3 | ~50°C | ~49°C | No | 1728 MHz | | 1 core CPU stress, 5 min | nano-3 only | 55.2°C | 54.8°C | No | 1728 MHz | | All 6 cores CPU stress, 5 min | nano-3 only | 60.4°C | 59.1°C | No | 1728 MHz | | Full cluster 18 cores CPU stress, 10 min | all 3 | 58.3°C | 56.8°C | No | 1728 MHz | | Full cluster GPU burn gpu-burn , 5 min | all 3 | 49.9°C | 51.3°C | No | ~1114–1190 MHz | Key finding: The Ampere GPU runs remarkably cool under full compute load. CPU stress is the thermal ceiling for this cluster at 58.3°C, still 37°C from the 95°C throttle threshold. Step 12: Test Inter-Node SSH for distributed jobs From nano-1, SSH to nano-2 without a password: ssh nano-1 ssh yuvrajsingh@10.10.1.2 If it asks for a password, set up keys between the Nanos: On nano-1: ssh-keygen -t ed25519 Press Enter for all defaults ssh-copy-id -i ~/.ssh/id ed25519.pub yuvrajsingh@10.10.1.2 ssh-copy-id -i ~/.ssh/id ed25519.pub yuvrajsingh@10.10.1.3 Retry: ssh yuvrajsingh@10.10.1.2 Works now. Repeat from each Nano to all others. Distributed frameworks Ray, MPI need passwordless SSH between all node pairs. Debugging Commands | What to Check | Command | |---|---| | All temps live | sudo tegrastats --interval 1000 | | CPU temp quick | cat /sys/class/thermal/thermal zone0/temp \| awk '{printf "%.1f°C\n", $1/1000}' | | GPU temp | cat /sys/class/thermal/thermal zone1/temp \| awk '{printf "%.1f°C\n", $1/1000}' | | Performance mode | nvpmodel -q | | All clock settings | sudo jetson clocks --show | | CPU frequencies all cores | cat /sys/devices/system/cpu/cpu /cpufreq/scaling cur freq | | GPU frequency | cat /sys/devices/platform/gpu.0/devfreq/ /cur freq | | Full system monitor | sudo jtop | | Ethernet link speed | ethtool enP8p1s0 \| grep Speed | | IP addresses | hostname -I | | Memory | free -h | | Disk | df -h | | Ping another node | ping 10.10.1.2 -c 3 | | Bandwidth test server | iperf3 -s | | Bandwidth test client | iperf3 -c <ip -t 20 -f m | Common Issues “Bandwidth is only ~94 Mbps instead of ~940 Mbps” The TL-SG108E has no physical DIP switches or buttons . Unlike the LS110P which had an Extend Mode DIP switch on the bottom, the SG108E is managed entirely through its web interface . A port manually set to 100 Mbps there will stay at 100 Mbps regardless of cable quality or driver. Step 1: Check the LED on the switch port. TL-SG108E port LEDs show negotiated speed: Green = 1000 Mbps ✓ Yellow/Amber = 10/100 Mbps ← this is what you have Step 2: Access the switch web UI. Connect a laptop directly to the switch or any device on the same subnet as the switch’s management IP . Factory default is 192.168.0.1 : Browser: http://192.168.0.1 Login: admin / admin factory default Step 3: Fix the port speed. Go to Switching → Port Config . Find the port your Nano is on. If Speed/Duplex is set to 100M Full or 100M Half , change it to Auto or 1000M Full . Click Apply . The port re-links within a few seconds. LED turns green. iperf3 will now show ~940 Mbps. “nvpmodel -m 2 doesn’t persist after reboot” - JetPack can reset to a default lower power mode on reboot. - Fix: add to /etc/rc.local : /usr/bin/nvpmodel -m 2 /usr/bin/jetson clocks “Can’t SSH from nano-1 to nano-2” - Did you set up inter-node SSH keys? Run ssh-keygen -t ed25519 on nano-1, then ssh-copy-id . - Check cat ~/.ssh/authorized keys on nano-2 to confirm nano-1’s key is present. - Verify 10.10.1.x addresses are live on both nodes: ip addr show eth0 . My Cluster Layout | Node | Hostname | IP | Specs | |---|---|---|---| | nano-1 | yuvrajsingh-jetson-nano1 | 10.10.1.1 | Orin Nano 8GB, MAXN SUPER, fan connected | | nano-2 | yuvrajsingh-jetson-nano2 | 10.10.1.2 | Orin Nano 8GB, MAXN SUPER, fan connected | | nano-3 | yuvrajsingh-jetson-nano3 | 10.10.1.3 | Orin Nano 8GB, MAXN SUPER, fan connected | Network: TP-Link TL-SG108E gigabit switch , Cat 6–8 Ethernet, ports 1–3. Negotiating at 100 Mbps r8168 driver quirk on L4T R36 . You’re ready. Build something. Built for smolcluster. Distributed training and inference, from scratch, on your own hardware. Final Checklist - All 3 Nanos boot successfully - Fan connected and spinning on each node - JetPack R36 Ubuntu 22.04 flashed on all 3 microSD cards - OOB setup complete on all 3 Nanos hostname, username, password - All Nanos reachable via ping <2ms latency - Private subnet 10.10.1.x configured on all nodes - SSH keys set up from laptop to all Nanos - Passwordless SSH works to all Nanos - SSH config created on your laptop - MAXN SUPER mode active nvpmodel -m 2 + jetson clocks - Python 3 and tools installed on all Nanos - jtop installed and showing GPU/CPU readout - Bandwidth test shows ~94 Mbps per link disable Extend Mode if stuck at 9.5 Mbps - Temperatures stable at <65°C under full load with fan running - nvpmodel + jetson clocks added to /etc/rc.local for persistence What’s Next? You have a working 3-node Jetson Orin Nano cluster, each with CUDA 12.6, cuDNN, and TensorRT ready to go. Ideas: Distributed GPU inference: Split a model across nodes, or assign each node a batch partition. Ray Serve or a custom split-inference script. TensorRT optimization: Convert ONNX models to TensorRT engines on-device, 2–4× inference speedup over vanilla PyTorch. Distributed preprocessing: CUDA-accelerated ETL across 3 nodes in parallel. Edge monitoring: Run a quantized LLM locally across the cluster; this is what smolcluster is for. Appendix A: Single Node Test Commands These are the exact commands used to produce the single-node results in Step 11. Baseline temperature Quick per-zone readout for zone in cpu-thermal gpu-thermal; do idx=$ grep -rl "^${zone}$" /sys/class/thermal/thermal zone /type | grep -o ' 0-9 ' | tail -1 temp=$ cat /sys/class/thermal/thermal zone${idx}/temp printf "%-20s %.1f°C\n" "$zone" "$ echo $temp | awk '{print $1/1000}' " done Or live stream via tegrastats sudo tegrastats --interval 1000 Single core stress 5 min with live temp monitoring Terminal 1: run stress stress-ng --cpu 1 --timeout 300s Terminal 2: watch CPU temp every 5s watch -n 5 'cat /sys/class/thermal/thermal zone0/temp | awk "{printf \"CPU: %.1f°C\n\", \$1/1000}"' All 6 cores stress 5 min with live temp monitoring Terminal 1 stress-ng --cpu 6 --timeout 300s Terminal 2 watch -n 5 'cat /sys/class/thermal/thermal zone0/temp | awk "{printf \"CPU: %.1f°C\n\", \$1/1000}"' Single node bandwidth iperf3 On the server node iperf3 -s On the client node iperf3 -c <server-ip -t 20 -f m Single node GPU burn Build once git clone https://github.com/wilicc/gpu-burn cd gpu-burn && make Run seconds as argument ./gpu burn 300 Appendix B: Cluster Test Commands These are the exact commands used to produce all cluster-wide results in Steps 9 and 11. All node pairs bandwidth iperf3 Start iperf3 server on each target node, then run client from another: Start servers on nano-1 and nano-2 ssh nano-1 "iperf3 -s -D" ssh nano-2 "iperf3 -s -D" Test each pair individually 20s each iperf3 -c 10.10.1.1 -t 20 -f m nano-3 - nano-1 iperf3 -c 10.10.1.2 -t 20 -f m nano-3 - nano-2 ssh nano-1 "iperf3 -c 10.10.1.2 -t 20 -f m" nano-1 - nano-2 Two links from nano-3 simultaneously iperf3 -c 10.10.1.1 -t 20 -f m & iperf3 -c 10.10.1.2 -t 20 -f m & wait All-to-all bandwidth all 3 nodes sending simultaneously Start iperf3 servers on all 3 nodes iperf3 -s -D ssh nano-1 "iperf3 -s -D" ssh nano-2 "iperf3 -s -D" sleep 2 Each node sends to a different node at the same time ssh nano-1 "iperf3 -c 10.10.1.2 -t 20 -f m 2 &1 | tail -3" & ssh nano-2 "iperf3 -c 10.10.1.3 -t 20 -f m 2 &1 | tail -3" & iperf3 -c 10.10.1.1 -t 20 -f m 2 &1 | tail -3 & wait Latency matrix all 6 directional pairs, concurrent php ping -c 20 -q 10.10.1.1 2 &1 | tail -2 & nano-3 - nano-1 ping -c 20 -q 10.10.1.2 2 &1 | tail -2 & nano-3 - nano-2 ssh nano-1 "ping -c 20 -q 10.10.1.2 2 &1 | tail -2" & nano-1 - nano-2 ssh nano-1 "ping -c 20 -q 10.10.1.3 2 &1 | tail -2" & nano-1 - nano-3 ssh nano-2 "ping -c 20 -q 10.10.1.1 2 &1 | tail -2" & nano-2 - nano-1 ssh nano-2 "ping -c 20 -q 10.10.1.3 2 &1 | tail -2" & nano-2 - nano-3 wait Full cluster CPU stress 18 cores, all 3 nodes Keep SSH sessions alive in the foreground so stress processes don’t die when the connection closes: ssh nano-1 "stress-ng --cpu 6 --timeout 600s" & ssh nano-2 "stress-ng --cpu 6 --timeout 600s" & stress-ng --cpu 6 --timeout 600s & wait Monitor temps from each node while running separate terminal : watch -n 60 ' echo "nano-1: $ ssh nano-1 "cat /sys/class/thermal/thermal zone0/temp" | awk "{printf \"%.1f\", \$1/1000}" °C" echo "nano-2: $ ssh nano-2 "cat /sys/class/thermal/thermal zone0/temp" | awk "{printf \"%.1f\", \$1/1000}" °C" echo "nano-3: $ cat /sys/class/thermal/thermal zone0/temp | awk "{printf \"%.1f\", \$1/1000}" °C" ' Full cluster GPU burn all 3 nodes Build gpu-burn on one node and copy to the others: Build on nano-3 git clone https://github.com/wilicc/gpu-burn cd gpu-burn && make Copy to other nodes scp gpu burn compare.fatbin nano-1:/tmp/ scp gpu burn compare.fatbin nano-2:/tmp/ Run on all 3 simultaneously 300 = seconds ssh nano-1 "cd /tmp && ./gpu burn 300" & ssh nano-2 "cd /tmp && ./gpu burn 300" & ./gpu burn 300 & wait Monitor GPU temps during the run: watch -n 30 'sudo tegrastats --interval 100 | head -1'