Show HN: cuTile Rust: Safe, data-race-free GPU kernels in Rust NVIDIA Research released cuTile Rust, a tile-based system for writing memory-safe, data-race-free GPU kernels in Rust. The project extends Rust's ownership model to GPU programming, achieving up to 92% of peak performance on NVIDIA B200 GPUs. cuTile Rust is available as an open-source research project and includes Grout, a Qwen3 inference engine built in collaboration with Hugging Face. cuTile Rust cutile-rs is a tile-based system for writing memory-safe, data-race-free GPU kernels in idiomatic Rust. It extends Rust's ownership discipline across the GPU launch boundary: mutable tensors are partitioned into disjoint pieces before launch, immutable tensors are shared, and generated launchers preserve ownership while GPU work is in flight. The same model supports synchronous launches, asynchronous pipelines, and CUDA graph replay. The cutile::module macro embeds a captured Rust AST for each kernel in the host binary; when a kernel is needed, cuTile Rust JIT-compiles that AST through CUDA Tile IR into a GPU cubin. Local opt-outs remain available when lower-level control is needed. We are excited to release this research project as a demonstration of how GPU programming can be made available in the Rust ecosystem. The software is in an early stage and under active development: you should expect bugs, incomplete features, and API breakage as we work to improve it. That being said, we hope you'll be interested to try it in your work and help shape its direction by providing feedback on your experience. Please check out CONTRIBUTING.md /NVlabs/cutile-rs/blob/main/CONTRIBUTING.md if you're interested in contributing. use cutile::prelude:: ; cutile::module mod kernel { use cutile::core:: ; cutile::entry fn add