# Apple Debuts Third-Generation Foundation Models and AFM Core Advanced

> Source: <https://letsdatascience.com/news/apple-debuts-third-generation-foundation-models-and-afm-core-d7aeb6dc>
> Published: 2026-06-12 03:50:57.439219+00:00

# Apple Debuts Third-Generation Foundation Models and AFM Core Advanced

Apple introduced the third generation of Apple Foundation Models (AFM), a family of five models spanning on-device and server deployments, in a June 8, 2026 post on its machine learning research site. The set includes two on-device models, AFM 3 Core and AFM 3 Core Advanced, and three server models that run on Private Cloud Compute: AFM 3 Cloud, ADM 3 Cloud (an image model), and AFM 3 Cloud Pro. Apple describes AFM 3 Core Advanced as a 20-billion-parameter, natively multimodal on-device model that uses a sparse architecture, activating only 1 to 4 billion parameters per request so it can run on Apple silicon. Apple worked with Google and NVIDIA to extend Private Cloud Compute for AFM 3 Cloud Pro to NVIDIA GPUs in Google Cloud while, Apple says, preserving its privacy guarantees. A January 12, 2026 joint statement from Apple and Google framed the next-generation AFM family as built with Google and its Gemini technology, though Apple's June 8 post emphasizes its own architecture and Apple silicon optimization.

### What happened

Apple announced the third generation of Apple Foundation Models (AFM) in a June 8, 2026 post on its machine learning research site, describing a family of five models that run across devices and Apple's Private Cloud Compute. The family includes two on-device models, AFM 3 Core (the successor to Apple's roughly 3-billion-parameter dense model) and AFM 3 Core Advanced, plus three server models: AFM 3 Cloud, ADM 3 Cloud (a dedicated image model for creation, editing, and Genmoji), and AFM 3 Cloud Pro. Apple says AFM 3 Core Advanced is its most powerful on-device model, a 20-billion-parameter, natively multimodal system that uses a sparse architecture to activate only 1 to 4 billion parameters at a time depending on the request.

### Technical details

Apple frames the sparse design as how it fits a 20-billion-parameter model onto consumer hardware. The technique, which Apple describes as Instruction-Following Pruning (IFP), keeps the full parameter set in flash (NAND) storage rather than in active DRAM. Because NAND-to-DRAM bandwidth is too slow to swap weights token by token, AFM 3 Core Advanced makes routing decisions per prompt: a lightweight dense block selects a fixed subset of parameters during initial processing, so only 1 to 4 billion parameters enter active memory for inference. AFM 3 Core, AFM 3 Core Advanced, AFM 3 Cloud, and ADM 3 Cloud are optimized for Apple silicon. AFM 3 Core Advanced requires A19 Pro (iPhone 17 Pro) or M3/M4 silicon and does not support devices with 8 GB of RAM. AFM 3 Cloud Pro, positioned for the most demanding agentic tool use and complex reasoning, is optimized for NVIDIA GPUs.

### The Google and NVIDIA partnership

Apple says it worked with Google and NVIDIA to extend Private Cloud Compute so AFM 3 Cloud Pro can run on NVIDIA GPUs in Google Cloud while preserving the same privacy guarantees Apple describes for on-device and Apple-silicon server inference, namely that user data is not stored or shared, including with Apple. A January 12, 2026 joint statement from Apple and Google characterized the next-generation AFM family as built in collaboration with Google and based on its Gemini technology and cloud infrastructure. Apple's June 8 technical post emphasizes its own model architecture and Apple-silicon optimization, and some independent reporting describes the on-device models as distilled from Gemini rather than running Gemini directly.

### Why it matters

For practitioners, the release illustrates two converging trends. First, sparse activation with flash-resident weights is becoming a practical tool for pushing larger, multimodal models onto constrained consumer silicon: IFP's approach of storing all parameters in flash and routing a subset into DRAM per prompt is a concrete example of the memory-budget tradeoffs the field is navigating. Second, even a vendor with deep in-house silicon and model capability is leaning on external frontier-model and cloud partners for its most demanding server workloads, a hybrid device-plus-cloud pattern that blends local inference with privacy-scoped cloud compute.

### What to watch

Open questions include developer API access for on-device versus server calls, benchmarks comparing AFM 3 Core Advanced against dense and other sparse on-device models across Apple silicon generations, how the NVIDIA-GPU-in-Google-Cloud path performs and scales under Private Cloud Compute, and the real memory and latency tradeoffs for multimodal workloads that will determine how widely AFM 3 Core Advanced can be deployed.

## Scoring Rationale

Verified: Apple's third-generation AFM family is a flagship release spanning a novel 20-billion-parameter sparse on-device model and Private Cloud Compute server models, with the most capable server model running on NVIDIA GPUs in Google Cloud. A major, deployment-defining model release for a billion-device ecosystem and highly relevant to on-device and hybrid-inference practitioners, though scoped to Apple's own platform rather than a field-wide frontier shift.

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

[Active Search Campaigns by BudgetEasy](/problems/sql/active-search-campaigns-by-budget)

[High CPC Clicks & Poor Landing PagesMedium](/problems/sql/high-cpc-clicks-poor-landing-page)

[Campaign ROAS by Attribution ModelHard](/problems/sql/campaign-roas-by-attribution-model)

250 free problems · No credit card

[See all Ad Tech problems](/problems/datasets/adtech)