03:32
2026-05-27
dev.to
large-language-models
I built a Rust inference engine that streams MoE expert weights from NVMe SSDs, no GPU required
A developer built Micro-Expert-Router, a Rust inference engine that streams Mixture-of-Experts model weights directly from NVMe SSDs using io_uring with O_DIRECT, eliminating the need for GPU VRAM. Thβ¦