llada.cpp

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00

2026-06-15

arxiv.org

large-language-models

Efficient On-Device Diffusion LLM Inference with Mobile NPU

Researchers introduced llada.cpp, the first NPU-aware inference framework for accelerating diffusion large language models on smartphones, achieving 17x-42x latency reduction over CPU baselines while …

// co-occurs with top 4 entities

LLaDA-8B 1 NPU 1 CPU 1 arXiv 1