Clustering 3 Jetson Orin Nano Super A developer built a 3-node cluster using NVIDIA Jetson Orin Nano Super 8GB developer kits, achieving ~759 Mbps per link, peak 58.3°C under full load, and zero throttling at 1728 MHz. The cluster is designed for CUDA-accelerated distributed inference at the edge. Clustering 3 Jetson Orin Nano Super Build a 3-node Jetson Orin Nano Super 8GB cluster with active cooling. Real numbers: ~759 Mbps per link gigabit , peak 58.3°C across all 3 nodes under full 18-core sustained load, zero throttling at 1728 MHz throughout. Tested on:NVIDIA Jetson Orin Nano 8GB Developer Kit, Ubuntu 22.04.5 LTS, kernel 5.15.148-tegra, L4T R36.4.7, CUDA 12.6, TP-Link TL-SG108E gigabit switchJetPack:R36 release , REVISION: 4.7, CUDA 12.6, Driver 540.4.0 You have three Jetson Orin Nanos. Each has a 6-core ARM Cortex-A78 CPU, an Ampere GPU with CUDA 12.6, and 8GB LPDDR5. This is real GPU compute at the edge. Not a toy.. This guide walks you through setting up a real 3-node Jetson Orin Nano cluster. Real measured numbers from this exact hardware: ~759 Mbps per link gigabit , peak 58.3°C across all 3 nodes under full 18-core sustained load, zero throttling at 1728 MHz throughout. Stable, thermally managed, and ready for CUDA-accelerated distributed inference. By the end, your Nanos will boot, talk to each other with <2ms latency, and be ready for inference workloads across all 3 nodes. Why This Setup? Real GPU compute on every node. Each Orin Nano has a 1024-core Ampere GPU and CUDA 12.6 built in. Not CPU-only edge boxes. Active cooling keeps temps in check. Under full 18-core sustained load across all 3 nodes, temps plateau at ~60°C, 35°C below the the 95°C throttle threshold. MAXN SUPER mode unlocks full performance. Set nvpmodel -m 2 and jetson clocks on each node. Clocks pin at 1728 MHz CPU / 1020 MHz GPU. Simple network. One unmanaged switch. Star topology. All nodes at <2ms latency. The Hardware | Component | Model | Notes | |---|---|---| Nodes | 3x NVIDIA Jetson Orin Nano 8GB Developer Kit | 6x ARM Cortex-A78 @ 1728 MHz, 1024-core Ampere GPU @ 1020 MHz, 8GB LPDDR5 | Switch | TP-Link TL-SG108E | 8-port unmanaged gigabit switch, plug & play | Power | 19V DC power adapter included with dev kit | Included in the box; use only the provided adapter | Cables | Cat 6–Cat 8 Ethernet ×3 | Any Cat 6+ works | Storage | microSD 128GB ×3 or NVMe SSD | JetPack + CUDA libs + models fill space fast; NVMe is faster | Cooling | Active fan included with dev kit | Required; do not run cluster workloads fanless | Case | 52Pi Raspberry Pi Cluster Case with 120mm RGB LED 5V Fan | Acrylic cluster enclosure with active 120mm top fan; fits the Orin Nano carrier boards with standard standoff spacing | Real Performance Expectations Numbers measured on this exact hardware: Throughput measured, gigabit :- nano-3→nano-1: 770 Mbps - nano-3→nano-2: 759 Mbps - nano-1→nano-2: 750 Mbps . Both nodes sending simultaneously from nano-3: 391 + 365 Mbps nano-3 NIC saturated at ~756 Mbps total . - nano-3→nano-1: Latency: <2ms between nodes on the local switch. Measured: 0.5–1.3ms in real ping tests. Thermals, single node 5-min stress, fan running : - Idle: ~50°C CPU / 49°C GPU - 1-core load: peak 55.2°C , stabilises 54–55°C, 1728 MHz throughout - All 6 cores: peak 60.4°C , stabilises 59–60°C, 1728 MHz throughout - Idle: Thermals, full cluster 18 cores across all 3 nodes, 10-min stress, measured on each node : - nano-1 peak: 57.0°C - nano-2 peak: 55.5°C - nano-3 peak: 58.3°C CPU / 56.8°C GPU - All nodes held 1728 MHz for the full 10 minutes. Zero throttling. - 95°C throttle threshold gives ~37°C headroom at peak cluster load. - nano-1 peak: GPU: 1024-core Ampere @ 1020 MHz MAXN SUPER + jetson clocks . CUDA 12.6, cuDNN 9.x, TensorRT 10.x. Perfect for: - Distributed GPU inference split model or batch across nodes - CUDA-accelerated preprocessing / ETL - Edge AI that runs entirely on-device - Learning how distributed GPU systems work Not good for: - High-bandwidth inter-node gradient sync at scale since these are linked via 100 Mbps Ethernet. For distributed training, consider a cluster with a 10 Gbps switch or direct NVLink connections. Step 1: Assemble the Hardware Start with the case. Follow the 52Pi cluster case user manual to assemble the acrylic layers and mount the standoffs before touching the Jetson boardssince it’s much easier to build the frame empty than to retrofit boards into it later. Once the frame is built, seat each Orin Nano carrier board into its layer using the standoffs provided in the 52Pi kit, then connect the 120mm 5V RGB fan header to an available 5V GPIO or fan pin as shown in the manual. The case fan handles ambient airflow across all three nodes; the per-board fan on each Orin Nano still handles direct SoC cooling and must remain connected. SoC or a system on a chip is where all the CPU, GPU ,memory, and other components are integrated into a single chip. The fan on the Orin Nano carrier board cools this critical component directly, while the case fan circulates air around the whole cluster. The Orin Nano Developer Kit ships with a 19V power adapter. Connect it to the barrel jack on the carrier board. The board powers on automatically when connected. No power button is required to be pressed. Make sure each node’s fan is connected. The developer kit includes an active fan and it is mandatory for sustained workloads; connect it to the fan header on the carrier board before first boot. Connect each Orin Nano to the TL-SG108E switch via Cat 6 Ethernet: nano-1 ──── Cat6 ──── TP-Link TL-SG108E port 1 nano-2 ──── Cat6 ──── TP-Link TL-SG108E port 2 nano-3 ──── Cat6 ──── TP-Link TL-SG108E port 3 Plug the switch into power. Wait 30 seconds for it to initialise. Step 2: Install JetPack & First Boot Follow the NVIDIA Jetson Orin Nano Developer Kit Quick Start Guide to get each node booted and set up. Two things to set consistently across all 3 nodes during setup: Username: same on every node e.g. yuvrajsingh Hostname: nano-1 , nano-2 , nano-3 After the wizard, enable SSH on each node: sudo systemctl enable --now ssh Then set max performance mode. Default after install is 25W; switch to MAXN SUPER: sudo nvpmodel -m 2 Disconnect the monitor/keyboard. Everything from here is headless. Step 4: Find Your IP Addresses SSH into nano-1: ssh yuvrajsingh@nano-1.local Find the IP: hostname -I Output example : 192.168.1.11 172.17.0.1 The first address is your LAN IP. Ignore 172.17.0.1 Docker’s bridge . Repeat for all 3 Nanos. Example IPs yours will differ : nano-1: 192.168.1.11 nano-2: 192.168.1.12 nano-3: 192.168.1.13 Write these down. You need them for the next step. Step 4b: Assign Private Subnet IPs Recommended Why?A private subnet gives each Nano a stable, predictable address you control, isolates all cluster traffic to a known range, and makes SSH config, scripts, and inter-node communication unambiguous. We’ll add a static secondary IP on 10.10.1.x/24 to each Nano’s eth0 alongside the existing DHCP address. SSH into each Nano using its DHCP IP: ssh yuvrajsingh@192.168.1.11 nano-1 Check the NetworkManager connection name: nmcli connection show Output: NAME UUID TYPE DEVICE Wired connection 1 a1b2c3d4-... ethernet eth0 Add the static private IP: On nano-1: sudo nmcli connection modify "Wired connection 1" +ipv4.addresses "10.10.1.1/24" sudo nmcli connection up "Wired connection 1" Verify: ip addr show eth0 Expected: 2: eth0: