Show HN: mlx-chronos - benchmark MLX inference engines on Apple Silicon A new open-source tool called mlx-chronos provides a standardized benchmark suite and community leaderboard for comparing local LLM inference engines on Apple Silicon Macs. The tool runs reproducible benchmarks, generates sealed JSON results, and allows users to compare performance across different engines and Mac models. Benchmark suite and community leaderboard for local LLM inference on Apple Silicon. Run a reproducible benchmark, save a sealed JSON result, and compare engines across Macs. Overview overview Supported Engines supported-engines Quick Start quick-start CLI Reference cli-reference Configuration configuration Benchmark Protocol benchmark-protocol Leaderboard Rules leaderboard-rules Submit Results submit-results Roadmap roadmap mlx-chronos is a standardized benchmark tool for local LLM inference engines on Apple Silicon. It detects your Mac, runs a fixed benchmark protocol against an OpenAI-compatible engine endpoint, and writes structured result files for local analysis or public leaderboard submission. The public leaderboard is available at igurss.github.io/mlx-chronos https://igurss.github.io/mlx-chronos . | Metric | Meaning | Public comparison use | |---|---|---| | TTFT cold | Time from request start to first non-empty streamed token with cache-avoiding prompts | Yes | | TTFT cached | Time to first token after a cache-priming call with the same prompt | Yes | | Request throughput | Completion tokens divided by full client-observed request time | Yes, when engine token usage is reliable | | Sustained throughput | Optional long throughput run for heat buildup and late-run degradation | Yes, under the sustained profile | | System RAM peak | Peak total Mac RAM in use during the benchmark | Yes | | Engine RSS | Post-warmup RSS of the engine server process when identifiable | Diagnostic only | | Thermal state | Start, end, worst state, samples, and affected benchmark phases when available | Context metadata | | Tool calling | Planned future success-rate benchmark | Not yet available | 0.3.1 simplifies public model identity metadata to model name, quantization, model format, and the required model reference URL, while keeping the guided workflows, timing metadata, and stricter leaderboard integrity checks introduced in 0.3.0 . | Engine | Project | Notes | |---|---|---| | Ollama | | jundot/omlx https://github.com/jundot/omlx raullenchai/Rapid-MLX https://github.com/raullenchai/Rapid-MLX waybarrios/vllm-mlx https://github.com/waybarrios/vllm-mlx ml-explore/mlx-lm https://github.com/ml-explore/mlx-lm NoteThe engine server must already be running before mlx-chronos run , mlx-chronos models , or mlx-chronos validate can query it. See CONTRIBUTING.md for engine setup details. pip install mlx-chronos Optional thermal-state support through macOS Foundation/PyObjC: pip install "mlx-chronos thermal " mlx-chronos --version mlx-chronos upgrade When run in an interactive terminal, mlx-chronos performs a best-effort background PyPI version check. If a newer release is available, it prints a short notice recommending: mlx-chronos upgrade Set MLX CHRONOS DISABLE UPDATE CHECK=1 to disable the automatic check. mlx-chronos engines mlx-chronos models --engine omlx mlx-chronos validate --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" mlx-chronos wizard The wizard provides a terminal menu for common actions and a guided benchmark builder with engine, model, profile, token bounds, output format, cooldown, preflight, notes, and other run options. When the selected engine server is running, the wizard loads /models and lets you select a model from the exposed IDs, with manual entry as a fallback. Before launching a benchmark, it shows the equivalent mlx-chronos run ... command so the same configuration can be reused in scripts. You can return to the main menu from benchmark setup without starting a run. mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" Results are written to results/local/ by default. Write both JSON and Markdown outputs mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" --format all Choose a custom output directory mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" --output-dir ~/Desktop/benchmarks Request throughput output token bounds for local experiments mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" --max-tokens 100 --min-tokens 80 Run the longer heat/throttling-sensitive sustained profile mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" --profile sustained Enforce cooldown after a recent run in the same output directory mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" --cooldown-seconds 300 Fail fast with an extra model access probe before measured work starts mlx-chronos run --engine omlx --model "Qwen3.5-4B-OptiQ-4bit" --preflight Include a model reference URL, required for public leaderboard submissions mlx-chronos run --engine omlx \ --model "Qwen3.5-4B-OptiQ-4bit" \ --model-url "https://huggingface.co/mlx-community/Qwen3.5-4B-OptiQ-4bit" | Command | Purpose | |---|---| mlx-chronos --version | Print the installed package version | mlx-chronos wizard | Open an interactive menu for common commands and guided benchmark setup | mlx-chronos upgrade | Check PyPI and upgrade the current Python environment if a newer release exists | mlx-chronos engines | List supported engines and local installed/running status | mlx-chronos models --engine