{"slug": "liquid-ai-releases-a-230m-model-optimized-for-phones-raspberry-pi-and-robots", "title": "Liquid AI releases a 230M model optimized for phones, Raspberry Pi, and robots", "summary": "Liquid AI released LFM2.5-230M, a 230-million-parameter model optimized for edge devices including phones, Raspberry Pi, and robots. The model achieves 213 tokens per second on a Galaxy S25 Ultra and 42 tokens per second on a Raspberry Pi 5, and was demonstrated controlling a Unitree G1 humanoid robot via an NVIDIA Jetson Orin. Despite its small size, it competes with models twice as large on benchmarks for tool use and data extraction.", "body_md": "[News](/news)\n\n[Models](/news/models)\n\n# LFM2.5-230M: Built to Run Anywhere\n\nToday, we're releasing **LFM2.5-230M**, our smallest model yet. It’s a fast, lightweight foundation for developers to fine-tune and deploy in agentic workflows. Built on the LFM2 architecture, it delivers exceptionally fast inference and runs everywhere, from cloud GPUs to low-cost CPUs (213 tok/s decode speed on Galaxy S25 Ultra, 42 tok/s on a Raspberry Pi 5). Despite its small size, it’s surprisingly capable at tool use and data extraction tasks.\n\nThe base (LFM2.5-230M-Base) and post-trained (LFM2.5-230M) models are available today on [Hugging Face](https://huggingface.co/LiquidAI/LFM2.5-230M). Check out our [docs](https://docs.liquid.ai/) on how to run and fine-tune them locally.\n\n## Training & Fine-tuning\n\nThe model was pre-trained for 19T tokens, including a 32K context extension phase. We apply a lightweight post-training recipe designed to preserve flexibility for developers targeting their own downstream applications.\n\nThe recipe consists of three stages: **(1) supervised fine-tuning with distillation from LFM2.5-350M, (2) direct preference optimization, and (3) multi-domain reinforcement learning**. The final checkpoint balances strong out-of-the-box capabilities with adaptability to downstream specialization, while remaining competitive with larger models.\n\nAs an early look at ongoing work, we deployed LFM2.5-230M on a Unitree G1 humanoid robot, running entirely on-device on its onboard NVIDIA Jetson Orin. Here the model acts as a skill-selection layer: it takes a single natural-language instruction and decomposes it into a sequence of tool calls that invoke pre-trained low-level skills provided by NVIDIA's SONIC framework. After a quick fine-tune for this task, the model turns a free-form command such as\n\n\"Hold still for 2 seconds, then walk forward at 1 meter per second for 3 meters, hold a forward one-leg kneel for 5 seconds, and walk backward at 0.5 meters per second for 3 meters\"\n\ninto a structured, multi-step plan, chaining skills like timed walking at a target velocity and a one-legged kneel. While the behaviors are deliberately simple at this stage, we think it's a compelling signal: a 230M-parameter model can be quickly fine-tuned and deployed on-device to serve as the natural-language control interface for a humanoid.\n\n## Benchmarks\n\nWe evaluated LFM2.5-230M across ten benchmarks covering both core capabilities and applied tasks. Despite its size, it **competes with and often beats models more than twice as large**, spanning knowledge (GPQA Diamond, MMLU-Pro), instruction following (IFEval, IFBench, Multi-IF), data extraction (CaseReportBench), and tool use (BFCLv3, BFCLv4, τ²-Bench Telecom and Retail).\n\n|\n|\n|\n|\n| |\nLFM2.5-230M | 25.41 | 20.25 | 71.71 | 38.40 | 37.70 |\nLFM2.5-350M | 30.64 | 20.01 | 76.96 | 40.69 | 44.92 |\nLFM2-350M | 27.58 | 19.29 | 64.96 | 18.20 | 32.92 |\nGranite 4.0-H-350M | 22.32 | 13.14 | 61.27 | 17.22 | 28.70 |\nGranite 4.0-350M | 25.91 | 12.84 | 53.48 | 15.98 | 24.21 |\nQwen3.5-0.8B (Instruct) | 27.41 | 37.42 | 59.94 | 22.87 | 41.68 |\nGemma 3 1B IT | 23.89 | 14.04 | 63.49 | 20.33 | 44.25 |\n\n|\n|\n|\n|\n| |\nLFM2.5-230M | 22.51 | 43.26 | 21.03 | 5.26 | 13.68 |\nLFM2.5-350M | 32.45 | 44.11 | 21.86 | 18.86 | 17.84 |\nLFM2-350M | 11.67 | 22.95 | 12.29 | 10.82 | 5.56 |\nGranite 4.0-H-350M | 12.44 | 43.07 | 13.28 | 13.74 | 6.14 |\nGranite 4.0-350M | 0.84 | 39.58 | 13.73 | 2.92 | 6.14 |\nQwen3.5-0.8B (instruct) | 13.83 | 35.08 | 18.70 | 12.57 | 6.14 |\nGemma 3 1B IT | 2.28 | 16.61 | 7.17 | 9.36 | 6.43 |\n\nThis makes LFM2.5-230M an ideal solution to power large-scale data extraction pipelines or lightweight on-device agentic workloads. However, given its compact size, we do not recommend it for reasoning-heavy workloads such as advanced math, code generation, or creative writing.\n\n## Fast Inference Everywhere\n\nLFM2.5-230M ships with day-one support across the inference ecosystem:\n\n**llama.cpp**— GGUF checkpoints for efficient edge inference** MLX**— Optimized inference for Apple Silicon** vLLM**— GPU-accelerated serving for production throughput** SGLang**— GPU-accelerated serving for production throughput** ONNX**— Cross-platform inference across diverse accelerators\n\n**CPU inference.** Thanks to the efficient LFM2 architecture, LFM2.5-230M is considerably faster than similar-sized models, including SSM hybrids and Gated Delta Networks. On both a Raspberry Pi 5 and a Qualcomm Snapdragon Gen4 (Samsung Galaxy S25 Ultra), it delivers the highest prefill and decode throughput in its class while keeping the smallest memory footprint. We tune the flash-attention flag per device to maximize prefill on each platform: enabled (-fa 1) on the Raspberry Pi 5 and disabled (-fa 0) on the Snapdragon Gen4.\n\n**GPU inference.** For production-grade enterprise deployments, we have also developed an internal GPU inference stack that delivers extremely low-latency serving. We benchmark it against other small models running on SGLang, and across all concurrency levels, LFM2.5 models achieve considerably lower end-to-end latency.\n\n## Get Started\n\nStart building today with LFM2.5-230M and LFM2.5-230M-Base, available on [Hugging Face.](https://huggingface.co/LiquidAI/LFM2.5-230M)\n\nWith LFM2.5, we're delivering on our vision of AI that runs anywhere. These models are:\n\n**Open-weight**— Download, fine-tune, and deploy without restrictions** Fast from day one**— Native support for llama.cpp, NexaSDK, MLX, and vLLM across Apple, AMD, Qualcomm, and Nvidia hardware** A complete family**— From base models for customization to specialized audio and vision variants, one architecture covers diverse use cases\n\nThe edge AI future is here. We can't wait to see what you build.\n\n[Download on Hugging Face](https://huggingface.co/LiquidAI/LFM2.5-230M)\n\n[Read our docs](https://docs.liquid.ai/)\n\n## Citation\n\nFor citations, please use the following reference or BibTeX:", "url": "https://wpnews.pro/news/liquid-ai-releases-a-230m-model-optimized-for-phones-raspberry-pi-and-robots", "canonical_source": "https://www.liquid.ai/blog/lfm2-5-230m", "published_at": "2026-07-01 07:03:12+00:00", "updated_at": "2026-07-01 07:19:36.056400+00:00", "lang": "en", "topics": ["ai-products", "ai-tools", "robotics", "ai-infrastructure"], "entities": ["Liquid AI", "LFM2.5-230M", "Hugging Face", "Unitree G1", "NVIDIA Jetson Orin", "NVIDIA", "Galaxy S25 Ultra", "Raspberry Pi 5"], "alternates": {"html": "https://wpnews.pro/news/liquid-ai-releases-a-230m-model-optimized-for-phones-raspberry-pi-and-robots", "markdown": "https://wpnews.pro/news/liquid-ai-releases-a-230m-model-optimized-for-phones-raspberry-pi-and-robots.md", "text": "https://wpnews.pro/news/liquid-ai-releases-a-230m-model-optimized-for-phones-raspberry-pi-and-robots.txt", "jsonld": "https://wpnews.pro/news/liquid-ai-releases-a-230m-model-optimized-for-phones-raspberry-pi-and-robots.jsonld"}}