cd /news/ai-infrastructure/basert-a-fast-inference-runtime-for-… · home topics ai-infrastructure article
[ARTICLE · art-46624] src=basecompute.co ↗ pub= topic=ai-infrastructure verified=true sentiment=↑ positive

BaseRT, A fast inference runtime for local AI on Apple Silicon

BaseCompute released BaseRT, a fast inference runtime for local AI on Apple Silicon, claiming up to 35% faster decode and 78% faster prefill on an Apple M4 Pro with 4-bit quantization. The runtime allows users to serve models locally without API keys or data leaving their device.

read1 min views1 publishedJul 1, 2026
BaseRT, A fast inference runtime for local AI on Apple Silicon
Image: source

$ curl -LsSf https://basecompute.co/install.sh | sh

Up to 35% on Decode, up to 78% on Prefill.

Tokens / sec · Apple M4 Pro · 4-bit

Serve a model with BaseRT, point your agent at it, and keep everything on your machine. No API keys, no data leaving your device.

── more in #ai-infrastructure 4 stories · sorted by recency
── more on @basecompute 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/basert-a-fast-infere…] indexed:0 read:1min 2026-07-01 ·