Apple Launches Core AI for Apple-Silicon Optimized On-Device Generative AI

Apple announced Core AI, a new framework for running generative AI models on-device, at WWDC 26. The framework supports large language models up to 70B parameters and runs exclusively on Apple Silicon, ensuring privacy and zero server costs. Core AI succeeds Core ML and provides unified hardware access, a Swift API, and ahead-of-time compilation.

At WWDC 26, Apple announced the Core AI framework https://developer.apple.com/documentation/coreai , the official successor to Core ML. It is designed to allow developers to run large language models and generative AI entirely on-device, supporting both custom-converted PyTorch models and pre-optimized open-source models. Apple says the new Core AI framework provides a unified architecture for deploying models ranging from compact 3B-parameter vision models to large-scale LLMs, including reasoning models with up to 70B-parameter reasoning models https://developer.apple.com/videos/play/wwdc2026/324/?time=33 , across the iPhone, iPad, Mac, and Apple Vision Pro. Core AI is the technology underpinning Apple Intelligence, and with the next release of its OSes and toolchain, Apple is making it available to developers to build what it calls "custom intelligence". Core AI, which can only run on Apple Silicon, ensures user data privacy, zero server dependencies, and zero per-token cloud costs. Key Core AI capabilities include unified hardware access, allowing workloads to seamlessly run across the CPU, GPU, and Neural Engine under one API; a memory-safe Swift API enabling zero-copy data paths and fine-grained control over inference memory; and ahead-of-time AOT compilation, which shifts work off the user's device, yielding near-instant load times. As mentioned, you can convert a PyTorch model into a Core AI model using Core AI PyTorch https://apple.github.io/coreai-torch . The simplest approach is exporting a PyTorch as a torch.export.ExportedProgram and convert it to a CoreAI AIProgram using TorchConverter .add exported program ep .to coreai . Alternatively, you can author a new Core AI model from a PyTorch one using built-in composite ops https://apple.github.io/coreai-torch/main/guides/composite-ops.html provided by the library, such as attention, RoPE embeddings, RMSNorm, and gather-matmul , registering custom lowering function to map new PyTorch ops to Core AI IR, or even creating custom Metal kernels https://apple.github.io/coreai-torch/main/guides/custom-metal-kernels.html for lower-level optimization. When converting a PyTorch model, an critical step is compressing it for deployment on Apple hardware https://apple.github.io/coreai-optimization/ . This process applies optimization techniques such as quantization https://apple.github.io/coreai-optimization/quantization/index.html and palettization https://apple.github.io/coreai-optimization/palettization/index.html , which are designed to align with the execution patterns of the Core AI runtime by default, ensuring efficient on-device performance. Model compression can help reduce the memory footprint of your model disk size and at runtime , reduce inference latency, reduce power consumption, or optimize them all at once. One important aspect of running an AIModel is its automatic specialization https://developer.apple.com/documentation/CoreAI/compiling-core-ai-models-ahead-of-time to the current hardware and OS version, which is carried through when the model is first loaded into the model cache. As a result, the first attempt to use a model may take significantly longer than subsequent runs, once the model has been already cached. Developers can control how and when this process happens by customizing , accessing the https://developer.apple.com/documentation/coreai/specializationoptions SpecializationOptions to check whether a model is already available or delete cached ones, and even sharing the model cache across an app group. https://developer.apple.com/documentation/coreai/aimodelcache AICacheModel With the introduction of Core AI, Apple is providing support for three distinct approaches to run ML/AI on its operating systems: Core ML, Core AI, and MLX Swift. Based on developer discussions based on Hacker News, Apple seems to suggest https://news.ycombinator.com/item?id=48459443 using Core ML for "classic, non-neural ML", such as decision trees or tabular feature engineering, Core AI for neural networks and transformers, and MLX for working with custom model weights—though potentially with lower performance https://news.ycombinator.com/item?id=48454273 . Community feedback also notes https://www.reddit.com/r/iOSProgramming/comments/1u1nfxr/comment/or7429o/?utm source=share&utm medium=web3x&utm name=web3xcss&utm term=1&utm content=share button that while Core AI "makes it easier to incorporate high-performance LLMs", its long-term value will depend "on the the future growth of the official Core AI/community".