{"slug": "show-hn-navatala-gpu-multi-back-end-gpu-kernels-and-python-bindings", "title": "Show HN: Navatala GPU – multi-back end GPU kernels and Python bindings", "summary": "Navatala Systems released Navatala GPU, an open-source cross-platform GPU compute runtime and kernel corpus supporting CUDA, HIP, Vulkan, OpenCL, and Metal backends. The alpha release includes Python bindings on PyPI and targets scientific computing workloads such as CFD and machine learning.", "body_md": "Cross-platform GPU compute runtime and kernel corpus for scientific computing, released under the Apache License 2.0.\n\nThe goal is a portable, inspectable GPU library that can run across ROCm/HIP, CUDA, Metal, Vulkan compute, and OpenCL, while still dispatching to vendor libraries where those are the best backend for an operation.\n\nThis distribution bundles two cooperating layers:\n\n-\n— a C++20 abstraction that presents one API over CUDA, HIP, Vulkan compute, OpenCL, and Metal. Handles device enumeration, memory allocation (device, pinned, managed), execution queues, event-based synchronization, CUDA/HIP graph capture, and a small stable C++ facade for common operations such as`runtime/`\n\n`navatala::linalg::axpy`\n\n. -\n— a corpus of compute kernels covering finite-volume CFD primitives, algebraic multigrid (AMG), classical iterative solvers (CG, BiCGSTAB, IDR, GMRES), sparse and dense BLAS, and a cross-platform machine-learning library (clustering, regression, KNN, decision trees, SVM, ARIMA, SHAP, UMAP, and more). Kernels ship in five backend forms (CUDA, HIP, OpenCL, Vulkan compute + SPIR-V, Metal) with consistent behaviour across vendors. Per-backend coverage is`kernels/`\n\n**not uniform**— seefor the current matrix.`docs/BACKEND_COVERAGE.md`\n\nA host-side kernel registry that wraps the kernel files for runtime lookup ships under\n\n`runtime/include/navatala/`\n\n(header) and`runtime/src/internal/`\n\n(source). It ships as code but does not carry a CMakeLists.txt in this release. -\n— example host orchestrator code built on the runtime, demonstrating how the CFD kernels compose into a Volume-of-Fluid pressure-projection workflow (`orchestrator/`\n\n`Navatala::Cfd::VofPressureOrchestrator`\n\n). Worked example, not a production solver; ships as code without a turnkey CMakeLists.\n\nThis is a developer-preview / alpha release. The runtime library and kernel corpus are both in active use for CFD workloads, but the public packaging, documentation, CI matrix, and backend conformance reports are still being expanded.\n\nThe Python package is available on PyPI:\n\n```\npip install navatala-gpu\n```\n\nImporting the package and inspecting its metadata does not require a GPU. Actual GPU execution requires a compatible backend runtime and the native extension for the selected backend.\n\n``` python\nimport navatala_gpu as ng\nfrom navatala_gpu import linalg\n\nprint(\"navatala-gpu\", ng.__version__, \"ABI\", ng.__abi_version__)\nprint(\"linalg ops:\", \", \".join(linalg.list_bindings()))\nprint(\"HIP AXPY in manifest:\",\n      ng.supports(\"linalg.axpy\", backend=\"hip\", dtype=\"float32\"))\nprint(\"known backends:\", sorted(ng.get_capabilities()[\"backends\"].keys()))\n```\n\nFor compute calls, pass DLPack-compatible tensors to APIs such as\n`linalg.axpy`\n\n, `linalg.gemm`\n\n, and `sparse.csr_spmv`\n\n. The bindings validate\nshape, dtype, and backend support before dispatch.\n\nPrerequisites depend on the backends you enable.\n\n| Backend | Required at build time |\n|---|---|\n| CUDA | CUDA Toolkit 11.0+ (`nvcc` , NVRTC, CUDA driver) |\n| HIP | ROCm 5.0+ (`hipcc` , hipRTC) |\n| Vulkan | Vulkan SDK with `glslc` for GLSL→SPIR-V compilation |\n| OpenCL | OpenCL 1.2+ headers and ICD loader |\n| Metal | macOS 11+ with Xcode Command Line Tools |\n\n```\ncmake -S . -B build\ncmake --build build -j\n\n# Run tests (requires at least one GPU backend to be available)\nctest --test-dir build --output-on-failure\n```\n\nDisable backends you don't need:\n\n```\ncmake -S . -B build \\\n    -DNAVATALA_GPU_USE_CUDA=OFF \\\n    -DNAVATALA_GPU_USE_HIP=ON \\\n    -DNAVATALA_GPU_USE_VULKAN=OFF \\\n    -DNAVATALA_GPU_USE_OPENCL=OFF\n```\n\nComplete, runnable examples are in [ examples/](/navatala-systems/navatala_gpu/blob/main/examples). The C ABI example\nuses\n\n`navatala_gpu_axpy_f32`\n\n; the C++ wrapper example uses\n`navatala::resources`\n\n, `navatala::buffer`\n\n, and `navatala::linalg::axpy`\n\n.\nAfter building, run:\n\n```\n./build/examples/axpy_example\n./build/examples/wrapper_axpy_example\n```\n\nBoth examples exit 0 with a `[skip]`\n\nmessage on hosts without a GPU, so they\nare safe to wire into CI even on CPU-only runners.\n\nFor a fuller tour, see [docs/ARCHITECTURE.md](/navatala-systems/navatala_gpu/blob/main/docs/ARCHITECTURE.md).\n\nThe repository includes dated MI300X benchmark fixtures under\n[ benchmarks/fixtures/hardware_runs/](/navatala-systems/navatala_gpu/blob/main/benchmarks/fixtures/hardware_runs).\nRecent HIP runs compare generated kernels and public wrapper dispatch against\nrocBLAS, rocSPARSE, and hipSPARSELt. Exact commands, JSON fixtures, and summary\nreports are documented in\n\n[.](/navatala-systems/navatala_gpu/blob/main/docs/benchmarks/ROCM_VENDOR_BENCHMARKS.md)\n\n`docs/benchmarks/ROCM_VENDOR_BENCHMARKS.md`\n\n— how runtime and kernels fit together.`docs/ARCHITECTURE.md`\n\n— per-backend capabilities and limitations.`docs/BACKENDS.md`\n\n— what's in the kernel corpus and how to read it.`docs/KERNELS.md`\n\n— generated backend coverage matrix.`docs/BACKEND_COVERAGE.md`\n\n— validation status and pending backend evidence.`docs/NUMERICAL_CONFORMANCE.md`\n\n— selective backend tuning priorities and benchmark evidence rules.`docs/TUNING_ROADMAP.md`\n\n— optional HIP benchmark harness comparing selected generated kernels against rocBLAS, rocSPARSE, and hipSPARSELt.`docs/benchmarks/ROCM_VENDOR_BENCHMARKS.md`\n\n— template for public ROCm correctness/benchmark reports.`docs/benchmarks/ROCM_VALIDATION_TEMPLATE.md`\n\n— Apple Silicon validation and opt-in Metal runtime tuning guide.`docs/benchmarks/METAL_VALIDATION.md`\n\n— generated domain-grouped kernel index.`docs/KERNEL_INDEX.md`\n\n— what is public, private, and generated.`docs/PUBLIC_PRIVATE_BOUNDARY.md`\n\n— dependency and license summary for the release tree.`docs/SBOM.md`\n\n— TestPyPI/PyPI release procedure.`docs/PYPI_RELEASE.md`\n\n— release-readiness checklist.`docs/ALPHA_RELEASE_CHECKLIST.md`\n\n— local alpha-candidate gate evidence.`docs/release/ALPHA_0_1_1_EVIDENCE.md`\n\n— vulnerability reporting policy.`SECURITY.md`\n\nSee [CONTRIBUTING.md](/navatala-systems/navatala_gpu/blob/main/CONTRIBUTING.md). External contributions to the\nhand-authored layers — runtime, examples, docs, tests, and tooling — are\nwelcome through the normal pull-request flow. The kernel sources are\nregenerated as a unit; the contribution model for those paths is documented\nin CONTRIBUTING.md.\n\nFor bug reports, backend validation results, or technical questions, open a\nGitHub Issue at [https://github.com/navatala-systems/navatala_gpu/issues](https://github.com/navatala-systems/navatala_gpu/issues).\n\nThe kernel sources under `kernels/{cuda,hip,opencl,vulkan,metal}/`\n\nand the\ngenerated Python facade modules under `python/navatala_gpu/`\n\nare produced\nfrom an upstream specification and regenerated together per release. The\n`kernels/manifest.json`\n\nfile is the machine-readable provenance record;\n[docs/KERNEL_INDEX.md](/navatala-systems/navatala_gpu/blob/main/docs/KERNEL_INDEX.md) and\n[docs/BACKEND_COVERAGE.md](/navatala-systems/navatala_gpu/blob/main/docs/BACKEND_COVERAGE.md) are rendered from it.\nSee [CONTRIBUTING.md](/navatala-systems/navatala_gpu/blob/main/CONTRIBUTING.md) for how patches against these paths\nare routed.\n\nApache License 2.0. See [LICENSE](/navatala-systems/navatala_gpu/blob/main/LICENSE) and [NOTICE](/navatala-systems/navatala_gpu/blob/main/NOTICE).\n\n```\nCopyright (c) 2026 Navatala Systems (OPC) Pvt Ltd\n```\n\n", "url": "https://wpnews.pro/news/show-hn-navatala-gpu-multi-back-end-gpu-kernels-and-python-bindings", "canonical_source": "https://github.com/navatala-systems/navatala_gpu", "published_at": "2026-06-25 14:58:48+00:00", "updated_at": "2026-06-25 15:14:28.465456+00:00", "lang": "en", "topics": ["ai-infrastructure", "developer-tools", "machine-learning", "artificial-intelligence"], "entities": ["Navatala Systems", "Navatala GPU", "CUDA", "HIP", "Vulkan", "OpenCL", "Metal", "PyPI"], "alternates": {"html": "https://wpnews.pro/news/show-hn-navatala-gpu-multi-back-end-gpu-kernels-and-python-bindings", "markdown": "https://wpnews.pro/news/show-hn-navatala-gpu-multi-back-end-gpu-kernels-and-python-bindings.md", "text": "https://wpnews.pro/news/show-hn-navatala-gpu-multi-back-end-gpu-kernels-and-python-bindings.txt", "jsonld": "https://wpnews.pro/news/show-hn-navatala-gpu-multi-back-end-gpu-kernels-and-python-bindings.jsonld"}}