Show HN: Navatala GPU – multi-back end GPU kernels and Python bindings

Navatala Systems released Navatala GPU, an open-source cross-platform GPU compute runtime and kernel corpus supporting CUDA, HIP, Vulkan, OpenCL, and Metal backends. The alpha release includes Python bindings on PyPI and targets scientific computing workloads such as CFD and machine learning.

Cross-platform GPU compute runtime and kernel corpus for scientific computing, released under the Apache License 2.0. The goal is a portable, inspectable GPU library that can run across ROCm/HIP, CUDA, Metal, Vulkan compute, and OpenCL, while still dispatching to vendor libraries where those are the best backend for an operation. This distribution bundles two cooperating layers: - — a C++20 abstraction that presents one API over CUDA, HIP, Vulkan compute, OpenCL, and Metal. Handles device enumeration, memory allocation device, pinned, managed , execution queues, event-based synchronization, CUDA/HIP graph capture, and a small stable C++ facade for common operations such as runtime/ navatala::linalg::axpy . - — a corpus of compute kernels covering finite-volume CFD primitives, algebraic multigrid AMG , classical iterative solvers CG, BiCGSTAB, IDR, GMRES , sparse and dense BLAS, and a cross-platform machine-learning library clustering, regression, KNN, decision trees, SVM, ARIMA, SHAP, UMAP, and more . Kernels ship in five backend forms CUDA, HIP, OpenCL, Vulkan compute + SPIR-V, Metal with consistent behaviour across vendors. Per-backend coverage is kernels/ not uniform — seefor the current matrix. docs/BACKEND COVERAGE.md A host-side kernel registry that wraps the kernel files for runtime lookup ships under runtime/include/navatala/ header and runtime/src/internal/ source . It ships as code but does not carry a CMakeLists.txt in this release. - — example host orchestrator code built on the runtime, demonstrating how the CFD kernels compose into a Volume-of-Fluid pressure-projection workflow orchestrator/ Navatala::Cfd::VofPressureOrchestrator . Worked example, not a production solver; ships as code without a turnkey CMakeLists. This is a developer-preview / alpha release. The runtime library and kernel corpus are both in active use for CFD workloads, but the public packaging, documentation, CI matrix, and backend conformance reports are still being expanded. The Python package is available on PyPI: pip install navatala-gpu Importing the package and inspecting its metadata does not require a GPU. Actual GPU execution requires a compatible backend runtime and the native extension for the selected backend. python import navatala gpu as ng from navatala gpu import linalg print "navatala-gpu", ng. version , "ABI", ng. abi version print "linalg ops:", ", ".join linalg.list bindings print "HIP AXPY in manifest:", ng.supports "linalg.axpy", backend="hip", dtype="float32" print "known backends:", sorted ng.get capabilities "backends" .keys For compute calls, pass DLPack-compatible tensors to APIs such as linalg.axpy , linalg.gemm , and sparse.csr spmv . The bindings validate shape, dtype, and backend support before dispatch. Prerequisites depend on the backends you enable. | Backend | Required at build time | |---|---| | CUDA | CUDA Toolkit 11.0+ nvcc , NVRTC, CUDA driver | | HIP | ROCm 5.0+ hipcc , hipRTC | | Vulkan | Vulkan SDK with glslc for GLSL→SPIR-V compilation | | OpenCL | OpenCL 1.2+ headers and ICD loader | | Metal | macOS 11+ with Xcode Command Line Tools | cmake -S . -B build cmake --build build -j Run tests requires at least one GPU backend to be available ctest --test-dir build --output-on-failure Disable backends you don't need: cmake -S . -B build \ -DNAVATALA GPU USE CUDA=OFF \ -DNAVATALA GPU USE HIP=ON \ -DNAVATALA GPU USE VULKAN=OFF \ -DNAVATALA GPU USE OPENCL=OFF Complete, runnable examples are in examples/ /navatala-systems/navatala gpu/blob/main/examples . The C ABI example uses navatala gpu axpy f32 ; the C++ wrapper example uses navatala::resources , navatala::buffer , and navatala::linalg::axpy . After building, run: ./build/examples/axpy example ./build/examples/wrapper axpy example Both examples exit 0 with a skip message on hosts without a GPU, so they are safe to wire into CI even on CPU-only runners. For a fuller tour, see docs/ARCHITECTURE.md /navatala-systems/navatala gpu/blob/main/docs/ARCHITECTURE.md . The repository includes dated MI300X benchmark fixtures under benchmarks/fixtures/hardware runs/ /navatala-systems/navatala gpu/blob/main/benchmarks/fixtures/hardware runs . Recent HIP runs compare generated kernels and public wrapper dispatch against rocBLAS, rocSPARSE, and hipSPARSELt. Exact commands, JSON fixtures, and summary reports are documented in . /navatala-systems/navatala gpu/blob/main/docs/benchmarks/ROCM VENDOR BENCHMARKS.md docs/benchmarks/ROCM VENDOR BENCHMARKS.md — how runtime and kernels fit together. docs/ARCHITECTURE.md — per-backend capabilities and limitations. docs/BACKENDS.md — what's in the kernel corpus and how to read it. docs/KERNELS.md — generated backend coverage matrix. docs/BACKEND COVERAGE.md — validation status and pending backend evidence. docs/NUMERICAL CONFORMANCE.md — selective backend tuning priorities and benchmark evidence rules. docs/TUNING ROADMAP.md — optional HIP benchmark harness comparing selected generated kernels against rocBLAS, rocSPARSE, and hipSPARSELt. docs/benchmarks/ROCM VENDOR BENCHMARKS.md — template for public ROCm correctness/benchmark reports. docs/benchmarks/ROCM VALIDATION TEMPLATE.md — Apple Silicon validation and opt-in Metal runtime tuning guide. docs/benchmarks/METAL VALIDATION.md — generated domain-grouped kernel index. docs/KERNEL INDEX.md — what is public, private, and generated. docs/PUBLIC PRIVATE BOUNDARY.md — dependency and license summary for the release tree. docs/SBOM.md — TestPyPI/PyPI release procedure. docs/PYPI RELEASE.md — release-readiness checklist. docs/ALPHA RELEASE CHECKLIST.md — local alpha-candidate gate evidence. docs/release/ALPHA 0 1 1 EVIDENCE.md — vulnerability reporting policy. SECURITY.md See CONTRIBUTING.md /navatala-systems/navatala gpu/blob/main/CONTRIBUTING.md . External contributions to the hand-authored layers — runtime, examples, docs, tests, and tooling — are welcome through the normal pull-request flow. The kernel sources are regenerated as a unit; the contribution model for those paths is documented in CONTRIBUTING.md. For bug reports, backend validation results, or technical questions, open a GitHub Issue at https://github.com/navatala-systems/navatala gpu/issues https://github.com/navatala-systems/navatala gpu/issues . The kernel sources under kernels/{cuda,hip,opencl,vulkan,metal}/ and the generated Python facade modules under python/navatala gpu/ are produced from an upstream specification and regenerated together per release. The kernels/manifest.json file is the machine-readable provenance record; docs/KERNEL INDEX.md /navatala-systems/navatala gpu/blob/main/docs/KERNEL INDEX.md and docs/BACKEND COVERAGE.md /navatala-systems/navatala gpu/blob/main/docs/BACKEND COVERAGE.md are rendered from it. See CONTRIBUTING.md /navatala-systems/navatala gpu/blob/main/CONTRIBUTING.md for how patches against these paths are routed. Apache License 2.0. See LICENSE /navatala-systems/navatala gpu/blob/main/LICENSE and NOTICE /navatala-systems/navatala gpu/blob/main/NOTICE . Copyright c 2026 Navatala Systems OPC Pvt Ltd