Thunderbolt-ibverbs: We have InfiniBand at home

A developer created a Linux kernel module and userspace shim that transforms a standard USB4 connection into a low-latency InfiniBand device, enabling distributed AI inference across two 128GB Strix Halo mini PCs. The project achieved sustained bidirectional throughput of approximately 95 Gb/s and one-way latency of 7 microseconds, significantly outperforming onboard 2.5 GbE and soft-RoCE over Thunderbolt networking. The code, generated primarily by AI models, is released for research use only with no warranty.

thunderbolt-ibverbs: We have InfiniBand at home I spent the past few weeks working on this project, I thought it might be interesting to write up a technical report on it, the motivation, the process, learnings, etc. DISCLAIMER: all of the code in this repo github.com/hellas-ai/thunderbolt-ibverbs https://github.com/hellas-ai/thunderbolt-ibverbs is AI-generated mostly Codex 5.5 and Opus 4.7 — while I made an effort to understand enough of it to keep it on-track, I almost certainly failed in many instances and I'm sure the code contains many false assumptions, hallucinations and plain stupidity. No warranty or guarantee offered, for research use only, not for human consumption. TL;DR. We write a linux kernel module and userspace shim to pretend our generic usb4 connection is a low-latency, high-performance InfiniBand device and use it to perform distributed inference across two 128GB Strix Halo mini PCs. Basic interop with Apple's native protocol is functional. ~48 Gb/s per direction ~95 Gb/s bidi total sustained ib write bw , 4-HCA aggregate at 1 MiB / 8 QPs with IOMMU off — vs ~2.3 Gb/s over the onboard 2.5 GbE and ~9 Gb/s for soft-RoCE on top of thunderbolt-net at the per-rail level. ~7 µs one-way ib write lat at 64 B, single QP — vs ~28 µs over RXE/2.5 GbE and ~65 µs over RXE/TBnet.