cd /news/ai-infrastructure/wave-a-universal-gpu-instruction-set… · home topics ai-infrastructure article
[ARTICLE · art-14494] src=github.com pub= topic=ai-infrastructure verified=true sentiment=↑ positive

Wave – A universal GPU instruction set architecture

A new open-source project called WAVE has introduced a vendor-neutral GPU instruction set architecture that allows developers to write GPU code once and run identical binaries on NVIDIA, AMD, Apple, and Intel hardware. The system, built on 11 hardware-invariant primitives across 34,000 lines of Rust code, has been verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X GPUs, achieving up to 3,587 GFLOPS on matrix multiplication. WAVE aims to standardize GPU computing in the same way ARM standardized CPUs, enabling cross-vendor compatibility without code changes.

read3 min publishedMay 26, 2026

The ARM of GPU computing. One binary, any GPU.

WAVE is a vendor-neutral GPU instruction set architecture. Write GPU code once, run it on NVIDIA, AMD, Apple, and Intel GPUs unchanged. The same binary produces identical results on all four vendors. ARM defines what a CPU is so multiple vendors can build compatible chips. WAVE does the same for GPU computation.

11 hardware-invariant primitives across 4 GPU vendors34,000 lines of Rust across 10 crates618+ unit tests,102/102 conformance tests passingVerified on Apple M4 Pro, NVIDIA T4, and AMD MI300X3,587 GFLOPS F32 matrix multiply on M4 Pro (53.5% of Apple MPS)89.29% CIFAR-10 accuracy via PyTorch integration, matching native exactly

pip install wave-gpu

Or build from source:

git clone https://github.com/Oabraham1/wave.git
cd wave
for crate in wave-decode wave-asm wave-dis wave-emu wave-compiler wave-metal wave-ptx wave-hip wave-sycl wave-runtime; do
  (cd $crate && cargo build --release)
done
python
import wave_gpu as wg

device = wg.device()
print(f"Running on: {device}")

a = wg.array([1.0, 2.0, 3.0, 4.0])
b = wg.array([5.0, 6.0, 7.0, 8.0])
out = wg.zeros(4)

print(f"a: {a}")
print(f"b: {b}")
Source Code (Python / Rust / C++ / TypeScript)
  |
  v
wave-compiler ──> WAVE Binary (.wbin) ──> wave-emu (reference emulator)
                        |
           ┌────────────┼────────────┐
           v            v            v
       wave-metal   wave-ptx    wave-hip    wave-sycl
       (Apple MSL)  (NVIDIA)    (AMD ROCm)  (Intel oneAPI)
           |            |            |            |
           v            v            v            v
        Apple GPU    NVIDIA GPU   AMD GPU    Intel GPU
Crate Purpose
wave-decode
Shared instruction decoder and binary format
wave-asm
Assembler (.wave text to .wbin binary)
wave-dis
Disassembler (.wbin binary to .wave text)
wave-emu
Reference emulator
wave-compiler
Multi-language compiler (Python/Rust/C++/TS to .wbin)
wave-metal
Apple Metal backend
wave-ptx
NVIDIA PTX backend
wave-hip
AMD HIP backend
wave-sycl
Intel SYCL backend
wave-runtime
SDK runtime with in-process compilation and kernel cache
sdk/python
Python SDK (pip install wave-gpu )

Each crate builds independently. No Cargo workspace.

Auto-tuned results on Apple M4 Pro at 4096x4096 matrix size (MPS baseline: 6,710 GFLOPS):

Kernel F32 GFLOPS F16 GFLOPS % of MPS
Blocked GEMM 3,587 4,049 53.5%
Fused GEMM+bias+ReLU 3,562 4,027 53.1%
Fused GEMM+bias+GELU 3,514 -- 52.4%

Cross-vendor hardware verification:

Vendor GPU Status
Apple M4 Pro Verified
NVIDIA T4 Verified
AMD MI300X Verified
Intel Arc Pending

Toward a Universal GPU Instruction Set Architecture: A Cross-Vendor Analysis of Hardware-Invariant Computational Primitives in Parallel Processors(Zenodo, 2026)- arXiv preprint: 2603.28793 - Under review: International Journal of Parallel Programming (IJPP), April 2026

  • Venue targets: ASPLOS 2027, CGO 2026, MLSys, CAV

See CONTRIBUTING.md for the fork-based workflow, code standards, and testing requirements.

Apache License, Version 2.0. See LICENSE for terms.

Asahi Linux reverse engineering team, Dougall Johnson (GPU microarchitecture documentation), AMD GPUOpen, Google Colab (NVIDIA T4 verification), DigitalOcean (AMD MI300X verification).

── more in #ai-infrastructure 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/wave-a-universal-gpu…] indexed:0 read:3min 2026-05-26 ·