4× RTX Pro 6000 Blackwell on Water, and the One Card That Wouldn't Behave Four RTX Pro 6000 Blackwell GPUs in a single water-cooled rig failed during sustained model training when one card repeatedly dropped off the PCIe bus under load. The cause was a missing power inductor that came off the board during the waterblock conversion, leaving the card unable to handle its 600-watt power draw. The three other cards in the build ran cleanly after the custom loop was installed. This rig exists to train models , not serve them. Four RTX PRO 6000 Blackwell cards in one chassis at 600 W each is 2.4 kW of heat to evict, and training runs are hours-to-days long with every card pinned at full TDP. Air coolers can do it for an inference burst; they cannot do it for a multi-day training job — the fans get loud, the cards stack their exhaust into each other, and the first one to thermal-throttle stalls the whole synchronous step. So we converted all four to waterblocks. Most of the build went fine. One didn’t — and the reason was sitting on the workbench. This post is the short version: what we did, what broke, how we found it, and where we landed. The rig the-rig - 4× RTX PRO 6000 Blackwell Workstation GB202, 96 GB GDDR7, 600 W - Threadripper Pro 7995WX on WRX90 - 4× Bykski waterblocks full-cover, GPU + VRM + memory front-side - Custom loop: single distro/reservoir, two pumps, distilled water, two Alphacool NexXxoS XT45 Full Copper 1260 mm Super Nova radiators 9× 140 mm fans each , four GPUs plumbed in parallel - 2× 1500 W PSUs 3 kW total budget to feed the ~2.4 kW sustained draw; AC circuit got upgraded mid-build after an earlier all-cards-down event under load The waterblocks themselves are straightforward: pull the stock cooler, clean the die, fresh paste on the GPU, thermal pads on memory and VRMs, torque the block down in a star pattern. The catch on these cards is the backplate — the memory packages on the back also need cooling, which means either pads against the case panel or small finned heatsinks glued on with thermal adhesive. I went with HOAOH 2.0 W/m·K tape on most spots and GENNEL G109 thermal adhesive where I needed something that wouldn’t migrate. The card that wouldn’t behave the-card-that-wouldnt-behave Three cards came up clean. The fourth — GPU 1 on this rig — would idle fine, then fall off the bus under load. The dmesg signature was always the same: NVRM: Xid PCI:0000:02:00 : 79, pid='