09:00
2026-06-05
infoq.com
large-language-models
Google LiteRT-LM Speeds Up Local Inference Up to 2.2x With Gemma 4 Multi-Token Prediction
Google released LiteRT-LM, a new runtime framework that accelerates on-device inference for its Gemma 4 large language models by up to 2.2x using multi-token prediction. The framework, built on the Liβ¦