Wave – A universal GPU instruction set architecture

wpnews.pro

cd /news/ai-infrastructure/wave-a-universal-gpu-instruction-set… · home › topics › ai-infrastructure › article

[ARTICLE · art-14494] src=github.com ↗ pub=2026-05-26T13:56Z topic=ai-infrastructure verified=true sentiment=↑ positive

Wave – A universal GPU instruction set architecture

A new open-source project called WAVE has introduced a vendor-neutral GPU instruction set architecture that allows developers to write GPU code once and run identical binaries on NVIDIA, AMD, Apple, and Intel hardware. The system, built on 11 hardware-invariant primitives across 34,000 lines of Rust code, has been verified on Apple M4 Pro, NVIDIA T4, and AMD MI300X GPUs, achieving up to 3,587 GFLOPS on matrix multiplication. WAVE aims to standardize GPU computing in the same way ARM standardized CPUs, enabling cross-vendor compatibility without code changes.

read3 min views12 publishedMay 26, 2026

The ARM of GPU computing. One binary, any GPU.

WAVE is a vendor-neutral GPU instruction set architecture. Write GPU code once, run it on NVIDIA, AMD, Apple, and Intel GPUs unchanged. The same binary produces identical results on all four vendors. ARM defines what a CPU is so multiple vendors can build compatible chips. WAVE does the same for GPU computation.

11 hardware-invariant primitives across 4 GPU vendors34,000 lines of Rust across 10 crates618+ unit tests,102/102 conformance tests passingVerified on Apple M4 Pro, NVIDIA T4, and AMD MI300X3,587 GFLOPS F32 matrix multiply on M4 Pro (53.5% of Apple MPS)89.29% CIFAR-10 accuracy via PyTorch integration, matching native exactly

pip install wave-gpu

Or build from source:

git clone https://github.com/Oabraham1/wave.git
cd wave
for crate in wave-decode wave-asm wave-dis wave-emu wave-compiler wave-metal wave-ptx wave-hip wave-sycl wave-runtime; do
  (cd $crate && cargo build --release)
done
python
import wave_gpu as wg

device = wg.device()
print(f"Running on: {device}")

a = wg.array([1.0, 2.0, 3.0, 4.0])
b = wg.array([5.0, 6.0, 7.0, 8.0])
out = wg.zeros(4)

print(f"a: {a}")
print(f"b: {b}")
Source Code (Python / Rust / C++ / TypeScript)
  |
  v
wave-compiler ──> WAVE Binary (.wbin) ──> wave-emu (reference emulator)
                        |
           ┌────────────┼────────────┐
           v            v            v
       wave-metal   wave-ptx    wave-hip    wave-sycl
       (Apple MSL)  (NVIDIA)    (AMD ROCm)  (Intel oneAPI)
           |            |            |            |
           v            v            v            v
        Apple GPU    NVIDIA GPU   AMD GPU    Intel GPU

Crate	Purpose
`wave-decode`
Shared instruction decoder and binary format
`wave-asm`
Assembler (.wave text to .wbin binary)
`wave-dis`
Disassembler (.wbin binary to .wave text)
`wave-emu`
Reference emulator
`wave-compiler`
Multi-language compiler (Python/Rust/C++/TS to .wbin)
`wave-metal`
Apple Metal backend
`wave-ptx`
NVIDIA PTX backend
`wave-hip`
AMD HIP backend
`wave-sycl`
Intel SYCL backend
`wave-runtime`
SDK runtime with in-process compilation and kernel cache
`sdk/python`
Python SDK (`pip install wave-gpu` )

Each crate builds independently. No Cargo workspace.

Auto-tuned results on Apple M4 Pro at 4096x4096 matrix size (MPS baseline: 6,710 GFLOPS):

Kernel	F32 GFLOPS	F16 GFLOPS	% of MPS
Blocked GEMM	3,587	4,049	53.5%
Fused GEMM+bias+ReLU	3,562	4,027	53.1%
Fused GEMM+bias+GELU	3,514	--	52.4%

Cross-vendor hardware verification:

Vendor	GPU	Status
Apple	M4 Pro	Verified
NVIDIA	T4	Verified
AMD	MI300X	Verified
Intel	Arc	Pending

Toward a Universal GPU Instruction Set Architecture: A Cross-Vendor Analysis of Hardware-Invariant Computational Primitives in Parallel Processors(Zenodo, 2026)- arXiv preprint: 2603.28793 - Under review: International Journal of Parallel Programming (IJPP), April 2026

Venue targets: ASPLOS 2027, CGO 2026, MLSys, CAV

See CONTRIBUTING.md for the fork-based workflow, code standards, and testing requirements.

Apache License, Version 2.0. See LICENSE for terms.

Asahi Linux reverse engineering team, Dougall Johnson (GPU microarchitecture documentation), AMD GPUOpen, Google Colab (NVIDIA T4 verification), DigitalOcean (AMD MI300X verification).

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/wave-a-universal-gpu-ins…

Read original on github.com → github.com/Oabraham1/wave

mentioned entities

WAVE

NVIDIA

AMD

Apple

Intel

ARM

PyTorch

Apple M4 Pro

metadata

slugwave-a-universal-gpu-instruction-set-architecture

topic#ai-infrastructure

secondary4 topics

sentimentpositive

canonicalgithub.com

navigation

← prevAI Killed Stack Overflow (and wh…

next →A model upgrade is a release, no…

── more in #ai-infrastructure 4 stories · sorted by recency

runtimewire.com · 10 Jul · #ai-infrastructure

ZML's Steeve Morin releases free LLMD to loosen AI inference from Nvidia

cryptobriefing.com · 10 Jul · #ai-infrastructure

Altera returns to growth as AI and robotics drive demand for programmable chips

ca.finance.yahoo.com · 10 Jul · #ai-infrastructure

Altera returns to growth as AI, robotics fuel demand, CEO says

sourcefeed.dev · 10 Jul · #ai-infrastructure

Why Mini PCs Run 70B Models That Discrete GPUs Can't

── more on @wave 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

wpnews · 8 Jul · #artificial-intelligence

xAI Launches Grok 4.5 With Pricing Built to Undercut Anthropic's Opus 4.8

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required