Google LiteRT-LM Speeds Up Local Inference Up to 2.2x With Gemma 4 Multi-Token Prediction
Google released LiteRT-LM, a new runtime framework that accelerates on-device inference for its Gemma 4 large language models by up to 2.2x using multi-token prediction. The framework, built on the Li…