Show HN: Alloy – a PyTorch backend and inference engine for Apple Silicon Alloy, a new PyTorch backend and inference engine for Apple Silicon, has been released as a technical preview. The open-source project compiles Python GPU kernels to Metal and supports LLM serving with a drop-in compatible API for OpenAI, Anthropic, and Ollama clients. Alloy includes a torch.compile backend and features like warm-prefix KV reuse, on-GPU sampling, and speculative decoding. Kernel authoring DSL, torch.compile backend and LLM serving for Apple Silicon. Alloy is a compiler and runtime for GPU compute kernels on Apple Silicon. You write kernels in Python. Alloy compiles them to Metal through a tile IR pipeline; covering everything from per-thread scalar kernels to cooperative tiled GEMM with simdgroup MMA and automatic operator fusion for multi-kernel pipelines. Status : technical preview. Requires Apple Silicon M1+ and macOS 13+. The Python packages need Python 3.10–3.12. Install install Inference server - Quickstart inference-server---quickstart torch.compile backend torchcompile-backend Benchmarks benchmarks Writing kernels writing-kernels Why Alloy why-alloy Contributing contributing License license Python pip / uv pip install 'alloy-kit serve ' local LLM server + CLI + torch.compile backend pip install alloy-kit lean: just the GPU kernel compiler no torch pip install 'alloy-kit all ' + training / vision / audio research extras import alloy as al The PyPI distribution is alloy-kit . The brackets are optional dependency groups: the lean base provides @al.kernel with the tile IR, MSL emitter and Metal dispatch machinery, and serve adds everything needed to run the server and the alloy CLI. Standalone no Python required : curl -fsSL https://raw.githubusercontent.com/rayanht/alloy/main/installer/install.sh | sh Installs a self-contained alloy CLI into /usr/local . From source contributors : see Contributing contributing . Alloy serves a loopback HTTP API that's drop-in compatible with the OpenAI, Anthropic and Ollama clients. Important Run alloy tune