{"slug": "liquid-ai-lfm-2-5-230m-230m-model-beats-1b-transformer-on-edge", "title": "Liquid AI LFM 2.5-230M: 230M Model Beats 1B Transformer on Edge", "summary": "Liquid AI released LFM 2.5-230M, a 230-million-parameter model that outperforms larger models on data extraction benchmarks, achieving 22.51 on CaseReportBench versus 13.83 for Qwen3.5-0.8B and 2.28 for Gemma 3 1B. The model's hybrid architecture, combining gated convolutions with Grouped Query Attention, enables efficient edge deployment, running at 42 tokens per second on a Raspberry Pi 5. This marks a significant advance for lightweight AI on constrained hardware.", "body_md": "A 230-million-parameter model just outscored a 1-billion-parameter transformer on data extraction — and you can run it on a Raspberry Pi 5 today. Liquid AI released **LFM 2.5-230M** on June 25, 2026. On the CaseReportBench data extraction benchmark, it scored 22.51 against Alibaba’s Qwen3.5-0.8B (13.83) and Google’s Gemma 3 1B (2.28) — models with 3.5x and 4.3x more parameters, respectively. This is not a quirky benchmark result. It is an architecture story with concrete deployment implications for developers building data pipelines and edge AI systems.\n\n## What LFM 2.5-230M Is\n\nLFM 2.5-230M is the smallest model in Liquid AI’s LFM 2.5 family. [Liquid AI](https://www.liquid.ai/blog/lfm2-5-230m) is a Boston-based company spun out of MIT CSAIL, founded by researchers with backgrounds in dynamical systems, signal processing, and robotics. The company raised $250M Series A at a $2.35B valuation with AMD Ventures leading.\n\nThe model was pre-trained on 19 trillion tokens and refined through a three-stage pipeline: continual pre-training, supervised fine-tuning, and multi-stage reinforcement learning targeting tool use and structured extraction. It carries a 32,000-token context window — unusually large for a 230M model. Its stated use case is explicit: lightweight data extraction pipelines and agentic tool-calling at the edge.\n\n## The Architecture: Not a Transformer, Not Mamba\n\nThe performance gap is explained by architecture. LFM 2.5-230M is built on the **LFM2** design — a hybrid that combines gated short-range convolutions with a minority of Grouped Query Attention (GQA) layers.\n\nStandard transformers carry a KV cache that grows with context length (O(n) memory). On a Raspberry Pi or mobile device with constrained RAM, that scaling behavior is a hard wall. Pure state-space models like Mamba avoid the KV cache entirely but lose long-range coherence. LFM2 splits the difference: convolution layers handle local sequence mixing at O(1) per-step decode cost, while a small number of GQA blocks preserve long-range interaction without ballooning memory usage.\n\nThe architecture layout was not hand-tuned — it was found via a hardware-in-the-loop search that optimized for quality under strict speed and memory budgets. The [LFM2 technical report on arXiv](https://arxiv.org/pdf/2511.23404) covers the full methodology. The 42 tokens-per-second result on a Raspberry Pi 5 is not incidental — it is what happens when architecture constraints are designed around the target hardware from the start.\n\n## Benchmarks: Where It Wins, Where It Does Not\n\nLFM 2.5-230M is a specialist. The benchmarks reflect that clearly.\n\n| Model | Parameters | CaseReportBench | BFCLv3 (Tool Use) | IFEval |\n|---|---|---|---|---|\nLFM 2.5-230M | 230M | 22.51 | 43.26 | 71.71 |\n| IBM Granite 4.0 | 350M | — | 39.58 | — |\n| Qwen3.5-0.8B | 800M | 13.83 | — | — |\n| Gemma 3 1B | 1,000M | 2.28 | 16.61 | 63.49 |\n\nOn MMLU-Pro, the picture flips: Qwen3.5-0.8B scores 37.42 against LFM 2.5-230M’s 20.25. General knowledge reasoning is not this model’s territory. If you need a general-purpose assistant, use something else. If you need structured output from a data pipeline running on constrained hardware, the benchmark case is strong.\n\n## Deployment: One Command to Start\n\nThe full local inference stack has day-one support. You can run it right now via Ollama:\n\n```\nollama run hf.co/LiquidAI/LFM2.5-230M-Instruct-GGUF:Q4_K_M\n```\n\nGGUF checkpoints are available for **llama.cpp** directly, with **vLLM** and **MLX** (Apple Silicon) support shipping on the same day. LM Studio and Jan offer GUI options. At Q4 quantization, the model runs under 1GB of RAM — it fits on a Raspberry Pi 5 4GB with room left for your application stack.\n\nHardware inference speeds from the official release: 213 tok/s on a Samsung Galaxy S25 Ultra, 42 tok/s on a Raspberry Pi 5. That second figure is interactive speed. A background extraction daemon running on a Pi at 42 tok/s is a real production option, not a proof of concept. Model checkpoints are available on [HuggingFace](https://huggingface.co/LiquidAI/LFM2.5-230M), with community GGUF variants packaged by Unsloth on the same day.\n\n## Licensing\n\nLFM 2.5-230M is free for individuals and organizations with under $10 million in annual revenue. Organizations above that threshold require an enterprise license. Both instruct and base checkpoints are available. If you are using community GGUF variants, verify the licensing terms before commercial deployment — they may differ from the official release.\n\n## What This Signals\n\nGartner projects SLM usage will surpass LLM usage by 2027. That forecast has been circulating for a while. What LFM 2.5-230M adds is not just another small transformer — it is evidence that [non-transformer architectures can outperform larger transformers on specific tasks](https://venturebeat.com/technology/liquid-ais-smallest-model-yet-lfm2-5-230m-beats-models-4x-its-size-at-data-extraction-can-run-anywhere/) while running on hardware that most AI infrastructure discussions ignore entirely.\n\nThe question for developers in 2026 is not whether large models are capable — they are. It is whether your data extraction pipeline actually requires a billion-parameter transformer, or whether a 230M hybrid running at 42 tok/s on a $35 board covers the work at a fraction of the cost. Based on these benchmarks, that question deserves a serious answer.", "url": "https://wpnews.pro/news/liquid-ai-lfm-2-5-230m-230m-model-beats-1b-transformer-on-edge", "canonical_source": "https://byteiota.com/liquid-ai-lfm25-230m-edge-ai/", "published_at": "2026-06-27 08:10:51+00:00", "updated_at": "2026-06-27 08:40:16.833304+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "ai-products", "ai-infrastructure", "ai-research"], "entities": ["Liquid AI", "MIT CSAIL", "AMD Ventures", "Alibaba", "Google", "Qwen3.5-0.8B", "Gemma 3 1B", "Raspberry Pi 5"], "alternates": {"html": "https://wpnews.pro/news/liquid-ai-lfm-2-5-230m-230m-model-beats-1b-transformer-on-edge", "markdown": "https://wpnews.pro/news/liquid-ai-lfm-2-5-230m-230m-model-beats-1b-transformer-on-edge.md", "text": "https://wpnews.pro/news/liquid-ai-lfm-2-5-230m-230m-model-beats-1b-transformer-on-edge.txt", "jsonld": "https://wpnews.pro/news/liquid-ai-lfm-2-5-230m-230m-model-beats-1b-transformer-on-edge.jsonld"}}