cd /news/large-language-models/show-hn-iphone-ane-holds-llm-tok-s-w… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-21201] src=github.com pub= topic=large-language-models verified=true sentiment=Β· neutral

Show HN: iPhone ANE holds LLM tok/s while MLX and LiteRT thermal-throttle

A new open-source benchmark, "apple-silicon-llm-bench," reveals that Google's LiteRT-LM runtime outperforms MLX-Swift on the iPhone 17 Pro for Gemma 4 E2B inference, achieving 55.4 tok/s with 4.5x less memory usage. The benchmark, which tests LLM performance across Apple Silicon devices under real constraints, found MLX-Swift wins on Qwen 3.5 2B decode at 61 tok/s, while CoreML/ANE delivers the smallest memory footprint at 241 MB but the slowest throughput. The results highlight that the optimal runtime depends on the specific model, with purpose-built formats like LiteRT-LM and ANE's memory efficiency challenging general-purpose frameworks.

read17 min publishedJun 4, 2026

On-device LLM benchmark for Apple Silicon β€” iPhone Β· iPad Β· Mac.

A neutral, reproducible benchmark for running local LLMs (and, in time, ASR / TTS) on Apple Silicon. Compares MLX Swift, llama.cpp, CoreML (swift-transformers), LiteRT-LM, ExecuTorch, ANEMLL β€” and Apple's own Foundation Models β€” under real device constraints, not just tok/s

on a server.

Repo:

apple-silicon-llm-bench

Β· CLI/brand:yardstick

. Started life asios-llm-benchmark

β€” iPhone is still the headline target, now measured alongside iPad and Mac.

Real LLM inference on a phone β€” on-device, no server. iPhone 17 Pro, 4-bit, short-chat (128 tokens), median of 3 cold runs. The winning runtime is model-dependent β€” and the upset is on Gemma.

Decode throughput β€” tok/s, higher is better (πŸ† = winner):

Model (4-bit, n=3) πŸ”΄ LiteRT-LM 🟣 MLX-Swift πŸ”΅ llama.cpp 🟠 CoreML/ANE
Gemma 4 E2B 55.4 πŸ†
47.5 37.8 33.4
Qwen 3.5 2B β€” 61.2 πŸ†
39.1 27.9

Peak memory β€” MB, lower is better (πŸ† = winner):

Model (4-bit, n=3) πŸ”΄ LiteRT-LM 🟣 MLX-Swift πŸ”΅ llama.cpp 🟠 CoreML/ANE
Gemma 4 E2B 641 πŸ†
2,900 3,156 1,187
Qwen 3.5 2B β€” 1,279 1,479 241 πŸ†

The upset β€” Gemma 4 E2B: Google'sLiteRT-LM(INT4-QAT, GPU, its native.litertlm

) beats MLX-Swift on decodeand uses ~4.5Γ— less memory (641 MB vs 2,900). The purpose-built runtime wins on its own format.MLX-Swift wins Qwen 3.5 2B decodeβ€” 61 vs 39 tok/s. (LiteRT-LM has no Qwen entry β€” its catalog is Gemma-only.)** CoreML / ANE is the memory champion**β€” Qwen 3.5 2B in just** 241 MB**(~5Γ— leaner than MLX's 1,279) via chunked-MLKV on the Neural Engine β€” but it's the** slowest decode**(ANE trades throughput for footprint), same story as on M4 Max.** ANE is near-parity with the desktop:CoreML Gemma 4 E2B does 33 tok/s on iPhone vs 32.5 on M4 Max β€” same silicon family. The GPUruntimes pay the real on-device tax: ~4–5Γ— slower than M4 Max (Qwen 3.5 2B β†’ 61 tok/s vs 292). Counting:**MLX / llama.cpp / LiteRT-LM report exact tokenizer tokens (LiteRT-LM viagetBenchmarkInfo

); CoreML/ANE counts streamed pieces (β‰ˆ tokens). LiteRT-LM runs to EOS (no per-call cap β†’ ~458-tok reply vs the others' 128 budget); decode tok/s is a rate, so the head-to-head holds.Fully automated, side-loaded viadevicectl

headless mode β€” nothing typed on the phone, same methodology as the desktop rows.Coming next: Apple Foundation Models, more models and more iPhones / iPads.One row is a great PR.

How the LiteRT-LM row was measured:google-ai-edge/LiteRT-LM

0.12.0 runninglitert-community/gemma-4-E2B-it.litertlm

(INT4-QAT) on the MetalGPUbackend, via the in-tree[adapter β€” same headless harness + prompt as every other row (3 cold runs, median). Token counts and tok/s come from]MediaPipeRuntime

LiteRT-LM's own benchmark counters(Conversation.getBenchmarkInfo

), so they're exact, not estimated. It generates to EOS (no per-call output cap in the API), so its token count is the model's full reply rather than the 128-token budget β€” decode tok/s is a rate and stays comparable; memory is exact process RSS. LiteRT-LM is vendored as alocal SwiftPM package(scripts/bootstrap.sh

clones it withGIT_LFS_SKIP_SMUDGE=1

; the released package trips SwiftPM's unsafe-flags rule via its-all_load

).

How the CoreML/ANE rows were measured:john-rocky/CoreML-LLM

on the Neural Engine (computeUnits: .cpuAndNeuralEngine

) β€” Gemma 4 E2B via the chunked.mlmodelc

path, Qwen 3.5 2B viaQwen35MLKVGenerator

(chunked MLKV, hence the 241 MB). Decode counts streamed pieces (β‰ˆ tokens); first-load ANE compilation makes its load time high (and it's the lowest-throughput runtime β€” the ANE trades speed for memory).Decode tok/s is the headline number; the full per-run audit (prefill, TTFT, inter-token jitter, memory) lives in

[.]RESULTS.md

The table above is cold-burst speed. Run the same model continuously and it flips: the GPU runtimes (MLX, LiteRT-LM) heat up and throttle 50%+ within ~60 s, while the ANE barely moves β€” it draws ~half the power, so it heats slowly and the SoC doesn't throttle it.

Gemma 4 E2B, iPhone 17 Pro Burst tok/s Sustained (10 min) Retained
CoreML / ANE
33 22
67%
MLX / GPU 48 18 38%
LiteRT-LM / GPU 56 27 48%

Two independent GPU runtimes collapsing the same way is a GPU-thermal property of the phone, not a runtime quirk. MLX ends up below the ANE; LiteRT keeps only a slim lead after shedding half its speed. The GPU wins the sprint; the ANE wins the marathon β€” and it frees the GPU for the rest of the app.

Method: 600 s continuous generation, cold (

nominal

) start, unplugged, tg128; decode rate from a rolling window. Raw JSONL inresults/raw/iphone17pro-*-energy-tg128.jsonl

; redraw with[(curves table via]scripts/throttle_chart.py

[). LiteRT-LM has no output-token cap (longer per-call) and that run started at]scripts/throttle_curve.py

fair

thermal; CoreML-LLM uses sliding-window attention (bounded context), part of why its decode stays flat.

The same harness on a laptop-class chip, for scale. No runtime wins everything here β€” each optimises a different corner of the throughput / memory / energy / streaming box:

mlx-swift wins decode throughput on every cell measured (1.4×–1.8Γ— over llama.cpp after early-2026 kernel updates).Apple Foundation Models is 2Γ— more energy-efficient per token than the GPU-backed runtimes, 4Γ— more than CoreML/ANE.CoreML / ANE wins peak memory (chunked MLKV) but is the slowestandthe worst on J/token.llama.cpp sits in the middle on speed and energy β€” no axis it wins, no axis it loses badly.

Tables for the exact numbers live below. |

Regenerate after adding rows: python scripts/generate_charts.py

.

One device, four runtimes, multiple models. Decode tok/s is the primary headline number; the full table (prefill, TTFT, peak memory, per-run audit trail) lives in

[. Read the]RESULTS.md

[Headline observations]section before drawing conclusions β€” the runtime ranking ismodel-size-dependent.

Logical model Params n mlx-swift (Q4) llama.cpp (Q4_K_M) coreml-llm litert-lm (.litertlm)
Qwen 2.5 0.5B 0.5 B 3 531.1
297.1 181.2 (FP16) n/a
Qwen 3.5 0.8B 0.8 B 3 421.1
201.1 58.2 (INT8) n/a
Qwen 3.5 2B 2 B 3 291.9
149.7 35.0 (INT8) n/a
Gemma 4 E2B 2 B 3 185.4
119.2 32.5 (INT4 palettized) pending
Gemma 4 E4B 4 B 3 113.5
80.5 not run
pending

litert-lm

column:= adapter wired againstpendinggoogle-ai-edge/LiteRT-LM

v0.12.0, M4 Max run not yet captured (see[/]RESULTS.md

Yardstick_USER_RUNS.md

).n/a= LiteRT-LM's catalog is Gemma-only (.litertlm

), so the Qwen rows have no entry. For reference, Google's E2B model card reports 56.5 tok/s on iPhone 17 Pro GPU β€” a vendor figure on a different device, not an M4 Max Yardstick measurement.

β†’ MLX-Swift now wins decode on every cell β€” 1.4×–1.8Γ— over llama.cpp β€” after upstream mlx-swift-lm

shipped Qwen + Gemma kernel updates in early 2026 (the Qwen rows roughly tripled vs. the snapshot captured before those landed). The old "llama.cpp Metal always wins small-model decode" rule is no longer true on M4 Max; re-measure before quoting it. CoreML / ANE is the slowest of the three on every cell, in exchange for the dramatic memory savings shown below.

The decode-tok/s table above hides the memory side. Same models, looking at peak working-set instead:

Logical model Params mlx-swift llama.cpp coreml-llm litert-lm
Qwen 2.5 0.5B 0.5 B 390
538 962 n/a
Qwen 3.5 0.8B 0.8 B 600
752 221 (INT8) n/a
Qwen 3.5 2B 2 B 1223 1443 230 (INT8)
n/a
Gemma 4 E2B 2 B 2829 3212 1036
pending
Gemma 4 E4B 4 B 4376
5150 β€” pending

β†’ "CoreML/ANE wins memory" is true once the chunked MLKV layout kicks in. At 0.5 B params MLX-Swift is still smaller (413 MB vs CoreML's 959 MB monolithic FP16); from 0.8 B onward, CoreML's chunked MLKV path (Qwen35MLKVGenerator

: mmap'd embed sidecar + on-demand ANE chunks) holds the process RSS roughly flat β€” 206 MB at 0.8 B, 215 MB at 2 B β€” while MLX and llama.cpp scale linearly with parameter count.

The number nobody else publishes: how many joules does each backend burn per generated token? Captured via scripts/measure_energy.py which co-runs

powermetrics

(whole-system, package power = CPU + GPU + ANE) and clips the sample window to the bench's reported active time.The ANE path draws ~half the GPU path's package power at full decode (12.7 W vs ~24.7 W) β€” the same power gap that makes the GPU runtimes thermally throttle on iPhone while the ANE holds its rate (see the sustained-throttle section above).

Runtime Avg pkg power (W) Energy / 512-tok run (J) J / token
apple-fm (system model)
7.6 67.4 0.11
mlx-swift (4-bit MLX) 24.7 123.0 0.24
llama.cpp (Q4_K_M, GGUF) 24.5 126.3 0.25
coreml-llm (INT4 palettized, ANE) 12.7 244.9 0.48

β†’ Energy ranking inverts the decode-tok/s ranking. Apple FM is 2Γ— more efficient per token than the GPU-backed runtimes despite producing tokens at ~half the rate. CoreML/ANE has the lowest instantaneous power (12.7 W) but is the worst J/tok at 4Γ— Apple FM, because the slower decode (32 tok/s) keeps the package powered up much longer. MLX-Swift and llama.cpp draw the most W (GPU) but produce tokens fast enough to break even at ~0.24 J/tok. Whole-system measurement includes the idle baseline so all four numbers slightly inflate per-token energy β€” useful for ranking, not for absolute attribution. iPhone energy uses the 1 %-battery-step API instead (different methodology, similar table shape).

llama.cpp (Q4_K_M GGUF, M4 Max, short-chat)

Model Params n TTFT (ms) Decode tok/s Peak Mem (MB)
Qwen 2.5 0.5B 0.5 B 3 22 297.1 538
Qwen 3.5 0.8B 0.8 B 3 22 201.1 752
Llama 3.2 1B 1.0 B 3 25 285.9
1022
Qwen 3.5 2B 2 B 3 29 149.7 1443
Gemma 4 E2B 2 B 3 41 119.2 3212
Gemma 4 E4B 4 B 3 62 80.5 5150

mlx-swift (Q4 / MLX, M4 Max, short-chat)

Model Params n TTFT (ms) Decode tok/s Peak Mem (MB)
Qwen 2.5 0.5B 0.5 B 3 21 531.1
390
Qwen 3.5 0.8B 0.8 B 3 36 421.1
600
Qwen 3.5 2B 2 B 3 42 291.9
1223
Gemma 4 E2B 2 B 3 68 185.4 2829
Gemma 4 E4B 4 B 3 90 113.5 4376

coreml-llm (CoreML / ANE, M4 Max, short-chat)

Model Params n TTFT (ms) Decode tok/s Peak Mem (MB)
LFM 2.5 350M 0.35 B 1 383 58.9 98
Qwen 2.5 0.5B 0.5 B 3 171 181.2 962
Qwen 3.5 0.8B 0.8 B 3 405 58.2 221
Qwen 3.5 2B 2 B 3 665 35.0 230
Gemma 4 E2B 2 B 3 525 32.5 1036

β†’ CoreML/ANE trades throughput for memory: 3-8Γ— less peak working set than MLX-Swift / llama.cpp at the same model size, at ~half the decode tok/s. The Qwen 3.5 0.8B / 2B numbers come from the dedicated Qwen35MLKVGenerator

(ANE chunked decode, KV in MLState

β€” public API since CoreML-LLM v1.9.0

), not the generic CoreMLLLM.load(from:)

path.

Apple FM is a single pre-installed model, so it can't share a "logical model" row with the open-weight runtimes above. It earns its own line as a reference point β€” the number to beat when "just use the system model" is the alternative.

Runtime Model n TTFT (ms) Decode tok/s Peak Mem (MB, in-process)
apple-fm Apple Foundation Model (default, ~3 B params est.) 3 269 85.2 27

Caveats β€” read before comparing.

Tokens are estimated(utf8.count / 4

) becauseFoundationModels

does not expose the tokenizer. Treat decode tok/s as Β±20%; the other runtimes report counts from their actual tokenizer.Peak memory is in-process only. The model lives in Apple's system process, not ours, so 27 MB is the harness overhead β€” not the true model footprint. Use Activity Monitor /powermetrics

for the system-wide picture.Quant is Apple-internal. Community reverse-engineering puts it at ~2-bit base weights + 4-bit task adapters; Apple has not published numbers. Don't read the decode tok/s as a comment on any specific quant choice.

Full results β€” by model, by runtime, full per-run audit trail β†’

This table is the repo. The easiest possible contribution is one new row. All three of these are equally valuable:

A new device. Run the existing models on your iPhone / iPad / Mac. Tooling in. The "Devices wanted" list at the bottom ofYardstick_USER_RUNS.md

is the shortlist.RESULTS.md

A new model. Drop the model id into thefor the runtime that can load it.ModelCatalog

A new runtime. Wire it up infollowing theios/BenchmarkApp/Sources/Runtimes/

LLMRuntime

protocol; the harness will pick it up.

Workflow once you have the build set up:

for run in 1 2 3; do
  yardstick run --task short-chat \
                --runtime mlx-swift \
                --model <id-or-hf-repo> \
                --output results/raw/<device>-<runtime>-<model>-short-chat-run${run}.jsonl
done

python scripts/render_results.py

CI runs python scripts/render_results.py --check

on every PR β€” it fails if the JSONLs and the tables disagree, so the human-edited section of RESULTS.md cannot drift out of sync with the raw data.

Full step-by-step (build, model picker, device-specific gotchas) lives in CONTRIBUTING.md.

Per (runtime, model, device, build)

tuple:

Speedβ€” TTFT, prefilltok/s

, decodetok/s

, sustained-decode drift over 512+ tokens.Memoryβ€” baseline, peak during decode, after-generation.** Thermal**β€” initial / peak / final state across the run.** Jitter**β€” inter-token latencyp50

/p95

/p99

ms, captured from the gap between consecutive.chunk

events. Surfaces the worst-case stall a chat UI will perceive even when the average decode rate looks smooth.Energyβ€” joules per token. iOS uses the 1%-battery-step API; Mac usesscripts/measure_energy.py

(wrapspowermetrics

, see "Optional: capture Mac energy" below).Lifecycleβ€” survives background β†’ foreground, cancellation latency, streaming.** Quality***(roadmap)*β€” WER / CER for ASR, perplexity / MMLU for LLM, byte-identical comparison vs Python references.

Methodology lives under methodology/. The numbers we publish follow

.

methodology/fairness-rules.md

sudo python scripts/measure_energy.py run \
     --task short-chat --runtime mlx-swift \
     --model mlx-community/gemma-4-e2b-it-4bit \
     --output results/raw/<device>-<runtime>-<model>-<task>-energy.jsonl

The wrapper starts powermetrics

in the background, runs yardstick

, stops powermetrics

, then patches the JSONL with energyJoules

, averagePackagePowerW

, and energyJoulesPerToken

. Numbers are whole-system β€” run on an idle desktop and use them to compare runtimes on the same Mac, not Macs to each other.

The iOS app's History β†’ β€’β€’β€’ β†’ Export all (JSONL) sheet hands you a single newline-delimited file. AirDrop it to your Mac, then:

python scripts/import_ios_export.py ~/Downloads/yardstick-*.jsonl
python scripts/render_results.py

The import script splits the bundle into one results/raw/<device>-<runtime>-<model>-<task>-runN.jsonl

per row, re-keying the device label so render_results.py

recognises it.

Yardstick/
β”œβ”€β”€ Package.swift              SPM: YardstickKit library + `yardstick` Mac CLI
β”œβ”€β”€ apple/
β”‚   └── YardstickCLI/          Mac command-line runner
β”œβ”€β”€ ios/
β”‚   └── BenchmarkApp/          On-device iOS app (`.xcodeproj`)
β”œβ”€β”€ runtimes/                  Per-runtime notes (adapters, gotchas, version pins)
β”œβ”€β”€ devices/                   Per-device pages (chip, RAM, OS, build, signing)
β”œβ”€β”€ methodology/               How we measure each axis fairly
β”œβ”€β”€ models/                    Curated model catalog
β”œβ”€β”€ prompts/                   Standardized prompts per task
└── results/
    β”œβ”€β”€ raw/                   JSONL dumps per run
    └── (summary tables generated into RESULTS.md)

Current status (May 2026): SPM build is clean. Runtime is blocked by[β€” the MLX Metal kernel bundle isn't emitted by]ml-explore/mlx-swift#349

swift build

from a downstream package, soswift run yardstick run …

exits withFailed to load the default metallib

. The same workaround applies tomlx-swift-examples/llm-tool

(its README says "Build the llm-tool scheme in Xcode"). A macOS app target that wraps the CLI through Xcode's Metal toolchain is queued as Phase 2.

When the Phase-2 macOS target lands, this is the intended shape:

$ yardstick list
$ yardstick run --task short-chat \
                --runtime mlx-swift \
                --model mlx-community/Qwen3-0.6B-4bit \
                --output results/raw/m4max-mlx-qwen3-0.6b.jsonl

For now, build verification only:

$ swift build       # Build complete!
cd ios/BenchmarkApp
./scripts/bootstrap.sh           # downloads llama.xcframework + Anemll source
open BenchmarkApp.xcodeproj      # set your Team in Signing & Capabilities

First launch downloads the chosen model (default: mlx-community/gemma-4-e2b-it-4bit

, ~1.3 GB) into the app's Documents directory. Use the picker to swap.

Runtime Adapter Wire-up
MLX Swift MLXRuntime.swift
SPM (mlx-swift-lm )
llama.cpp LlamaCppRuntime.swift
vendored llama.xcframework (bootstrap.sh )
CoreML (swift-transformers) CoreMLRuntime.swift
SPM (swift-transformers Models + Generation )
LiteRT-LM MediaPipeRuntime.swift
SPM (google-ai-edge/LiteRT-LM β‰₯ 0.12.0, product LiteRTLM ); #if canImport(LiteRTLM) -gated
ExecuTorch ExecuTorchRuntime.swift
SPM (pytorch/executorch swiftpm-* branch)
ANEMLL AnemllRuntime.swift
local SPM via vendored Anemll/ (bootstrap.sh )
Apple Foundation Models AppleFMRuntime.swift
system framework, #if canImport(FoundationModels) (macOS 26 / iOS 26)

Adapters whose framework isn't present at build time are gated with #if canImport(...)

and fall back to a clear "not added" error rather than failing the build.

Verified in-tree:

β€” Apple M4 Max (macOS 26)devices/mac-m4-max.md

β€” MacBook Air M3, 16 GB (macOS 26)devices/macbook-air-m3.md

β€” iPhone 17 Pro (iOS 26)devices/iphone-17-pro.md

Community devices wanted. If you have an Apple Silicon device not listed above, the fastest way to contribute a row to RESULTS.md

is to:

  • Add a devices/<your-device>.md

describing the hardware/OS/build. - Run the app or CLI per .methodology/measurement.md

  • PR the resulting results/raw/<device>-*.jsonl

and the updatedRESULTS.md

rows.

Devices we'd love numbers for:

  • iPhone 15 Pro / 16 Pro / 17 Pro Max / 17 Air
  • iPad Pro M2 / M4
  • MacBook Pro M1 / M2 / M3 / M4 (Pro / Max)
  • Mac Studio Ultra (M2 Ultra / M3 Ultra)
  • Mac mini M2 / M4
Backend Build on Mac Run on Mac Notes
MLX Swift LM βœ… βœ… Native SPM macOS. The Xcode-built tool target sidesteps mlx-swift#349.
llama.cpp βœ… βœ… macos-arm64_x86_64 slice in Vendored/llama.xcframework . CLI uses LD_RUNPATH_SEARCH_PATHS to resolve the framework at runtime.
CoreML (CoreMLLLM) βœ… βœ… (some models) macOS 15+. Models with the single-top-level .mlpackage layout (e.g. LFM 2.5 350M) auto-download from HF and run; the chunked / multi-.mlpackage repos (e.g. mlboydaisuke/qwen3.5-0.8B-CoreML ) need upstream CoreMLLLM work to load.
ExecuTorch βœ… ⏸ Build path is clean; current ET-community models ship SentencePiece tokenizer.model but ET's hf_tokenizer.cpp expects HF-format tokenizer.json . Needs a model with HF tokenizer or an ET-side SentencePiece adapter.
ANEMLL βœ… ⏸ Build path is clean; swift-huggingface.HFDown fails on .mlmodelc/ directory-shaped HF repos. Needs upstream down work.
LiteRT-LM βœ… ⏸ google-ai-edge/LiteRT-LM v0.12.0 ships ios-arm64 + macos-arm64 slices, wired via SPM (product LiteRTLM , macOS 12+). Build path clean; M4 Max run pending. Watch the package's -all_load for duplicate-symbol clashes with the vendored llama /executorch static libs (fall back to scoped -force_load ).

Phase 1β€” repo rename, top-level SPM (YardstickKit

+yardstick

CLI), Mac CLI builds clean, README + device pages, methodology docs, iOS app intact.Phase 2β€” Mac CLI runs end-to-end (via Xcode-built target to sidestep mlx-swift #349), first M4 Max numbers committed toRESULTS.md

.Phase 2.5β€” All 5 buildable backends (MLX, llama.cpp, CoreML, ExecuTorch, ANEMLL) wired into the Mac tool target; first cross-backend row (Gemma 4 E2B: MLX vs llama.cpp).Phase 3*(in progress)*β€” fill remaining adapter row gaps (down + model-format work, mostly upstream), MacBook Air M3 + iPhone 17 Pro numbers via[Yardstick_USER_RUNS.md](../Yardstick_USER_RUNS.md)

.Phase 4β€” quality / accuracy tasks: WER + CER (reusingswift-transformers

Whisper normalizer), perplexity, MMLU subset. ASR + TTS adapters (WhisperKit, Apple Speech, system TTS).Phase 5β€” public results dashboard, regeneration CI, comparison plots.

MIT, see LICENSE.

── more in #large-language-models 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/show-hn-iphone-ane-h…] indexed:0 read:17min 2026-06-04 Β· β€”