05:17
2026-06-04
github.com
large-language-models
Show HN: iPhone ANE holds LLM tok/s while MLX and LiteRT thermal-throttle
A new open-source benchmark, "apple-silicon-llm-bench," reveals that Google's LiteRT-LM runtime outperforms MLX-Swift on the iPhone 17 Pro for Gemma 4 E2B inference, achieving 55.4 tok/s with 4.5x lesβ¦