BaseRT, A fast inference runtime for local AI on Apple Silicon

BaseCompute released BaseRT, a fast inference runtime for local AI on Apple Silicon, claiming up to 35% faster decode and 78% faster prefill on an Apple M4 Pro with 4-bit quantization. The runtime allows users to serve models locally without API keys or data leaving their device.

$ curl -LsSf https://basecompute.co/install.sh | sh Up to 35% on Decode, up to 78% on Prefill. Tokens / sec · Apple M4 Pro · 4-bit Serve a model with BaseRT, point your agent at it, and keep everything on your machine. No API keys, no data leaving your device.