Photo: 9to5mac.com
· rights & takedowns The Information reports that Apple will route some Siri queries to Google Cloud and run them on Nvidia Blackwell B200 GPUs as part of a licensed use of Googles Gemini model, according to 9to5Mac's coverage of The Information. The Information also reports that Apple has approved the use of Nvidia's confidential compute technology to encrypt data while it is processed on the chips. 9to5Mac quotes The Information as saying the move diverges from Apples prior attempt to control all critical product ingredients, and that it is unclear how Apples previously launched server system will fit into the upcoming Siri rollout. Nvidia is quoted describing its confidential compute feature as preserving "the confidentiality and integrity of AI models deployed on Rubin, Blackwell, and Hopper GPUs." What happened The Information reports, via 9to5Mac, that Apple will route some user queries for a new version of Siri to Google Cloud and run them on Nvidia Blackwell B200 GPUs as part of a licensed deployment of Googles Gemini model. The Information also reports that Apple has approved the use of Nvidias confidential compute feature to encrypt data while it is being processed on those GPUs. The Information, quoted in 9to5Mac, describes this choice as diverging from Apples previous approach to controlling the full stack, and notes uncertainty about how Apples previously launched server system will be used in the upcoming Siri product launch. Editorial analysis - technical context The Blackwell B200 is presented publicly by Nvidia as a data-center GPU designed for large-scale model training and inference; vendors describe Blackwell as the successor to Hopper with improvements in inference throughput, memory bandwidth, and multi-GPU scaling. Nvidias confidential compute is a hardware-based security capability that isolates and encrypts data during on-chip processing; Nvidia is quoted saying it "preserves the confidentiality and integrity of AI models deployed on Rubin, Blackwell, and Hopper GPUs," enabling sensitive workloads to run in shared cloud environments with near-native performance. Industry context Companies deploying large foundation models often balance on-device execution and cloud-hosted inference to trade latency, model capacity, and privacy. Editorial analysis: industry observers note that using cloud-hosted GPUs with confidential compute is a growing pattern for organizations that need access to very large models but also want cryptographic protections for data during processing. Editorial analysis: relying on a cloud providers GPU fleet can accelerate access to cutting-edge hardware while introducing operational dependencies on the cloud vendor and GPU vendor ecosystem. What to watch For practitioners, useful indicators include: • the latency and cost profile for queries routed to cloud-hosted Gemini inference on Blackwell hardware; • technical documentation or SOC/attestation details showing how confidential compute is implemented and audited; • the division of workloads between on-device Siri components and cloud-based Gemini inference; • any public details about how Apples existing server hardware will interoperate with Google Cloud-hosted inference. Scoring Rationale Notable to practitioners because it documents a major consumer-device vendor using cloud-hosted, vendor-grade GPUs and confidential compute for assistant inference, affecting deployment, latency, and privacy tradeoffs. Practice interview problems based on real data 1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with. Try 250 free problems