cd/entity/CPU· home› entities› CPU

grep -l @cpu /news/*.json | wc -l → 10

CPU

mentions 10 type Organization feed RSS

// recent coverage 10 mentions

03:34

2026-07-08

dev.to

large-language-models

CPU vs GPU: Why Large Language Models Need GPUs — What Really Happens After You Press Enter?

A developer explains the technical journey from pressing Enter to receiving a response from a large language model (LLM) like ChatGPT, Gemini, or Claude. The process involves tokenization, embedding, …

14:10

2026-07-04

letsdatascience.com

ai-chips

TSMC Strengthens Credit Outlook Through AI Chip Leadership

S&P Global Ratings revised its outlook on TSMC's AA- long-term issuer credit rating to Positive on June 23, citing stronger leadership in advanced high-performance computing chips. S&P estimated AI pr…

05:14

2026-06-30

moondream.ai

ai-infrastructure

Popping the GPU Bubble

Moondream HQ reveals that GPUs often sit idle during AI model inference due to CPU overhead, a phenomenon called the 'GPU bubble.' The company's Photon system uses pipelined decoding to overlap CPU an…

04:34

2026-06-27

news.ycombinator.com

artificial-intelligence

Ask HN: How much memory is useable by GPU in MacBook?

MacBook GPU memory allocation depends on total system memory: up to 36GB allows 66% GPU usage, while 36GB or more allows 75%. Users can increase allocation via Terminal for AI models. Shared memory el…

00:18

2026-06-26

extropic.ai

ai-infrastructure

Thermodynamic Computing from Zero to One

Extropic unveiled thermodynamic computing hardware and algorithms that run generative AI workloads using radically less energy than GPUs. The company released its `thrml` library and plans to build a …

20:40

2026-06-15

gilesthomas.com

machine-learning

Jax: Commitment Issues

JAX's default_device context manager places arrays on the specified device but does not commit them, allowing JAX to move them to other devices. This caused array lookups to take over a second by trig…

12:20

2026-06-15

dev.to

artificial-intelligence

The CPU Is Back in the Stack — and Nobody Budgeted for It

Agentic AI workloads are shifting the compute ratio, making CPUs the critical coordination substrate rather than a support component for GPUs. This inversion is exposed by current Xeon supply tightnes…

04:00

2026-06-15

arxiv.org

large-language-models

Efficient On-Device Diffusion LLM Inference with Mobile NPU

Researchers introduced llada.cpp, the first NPU-aware inference framework for accelerating diffusion large language models on smartphones, achieving 17x-42x latency reduction over CPU baselines while …

02:47

2026-06-06

dev.to

artificial-intelligence

What Is Ollama? The Complete Guide to Running LLMs Locally in 2026

Ollama, an open-source runtime for large language models, enables users to run models locally on Mac, Windows, or Linux with a single command, eliminating the need for cloud dependencies or complex en…

18:56

2026-04-30

pytorch.org

large-language-models

SMG: The Case for Disaggregating CPU from GPU in LLM Serving

Shepherd Model Gateway (SMG) has disaggregated all CPU-bound workloads from GPU inference in large language model serving, moving tokenization, detokenization, and parsing into a dedicated Rust gatewa…

// co-occurs with top 8 entities

GPU 9 VRAM 2 Shepherd Model Gateway 1 SGLang 1 vLLM 1 Python 1 Rust 1 GIL 1