Liquid AI LFM 2.5-230M: 230M Model Beats 1B Transformer on Edge

wpnews.pro

cd /news/artificial-intelligence/liquid-ai-lfm-2-5-230m-230m-model-be… · home › topics › artificial-intelligence › article

[ARTICLE · art-41676] src=byteiota.com ↗ pub=2026-06-27T08:10Z topic=artificial-intelligence verified=true sentiment=↑ positive

Liquid AI LFM 2.5-230M: 230M Model Beats 1B Transformer on Edge

Liquid AI released LFM 2.5-230M, a 230-million-parameter model that outperforms larger models on data extraction benchmarks, achieving 22.51 on CaseReportBench versus 13.83 for Qwen3.5-0.8B and 2.28 for Gemma 3 1B. The model's hybrid architecture, combining gated convolutions with Grouped Query Attention, enables efficient edge deployment, running at 42 tokens per second on a Raspberry Pi 5. This marks a significant advance for lightweight AI on constrained hardware.

read4 min views1 publishedJun 27, 2026

Liquid AI LFM 2.5-230M: 230M Model Beats 1B Transformer on Edge — Image: Byteiota (auto-discovered)

A 230-million-parameter model just outscored a 1-billion-parameter transformer on data extraction — and you can run it on a Raspberry Pi 5 today. Liquid AI released LFM 2.5-230M on June 25, 2026. On the CaseReportBench data extraction benchmark, it scored 22.51 against Alibaba’s Qwen3.5-0.8B (13.83) and Google’s Gemma 3 1B (2.28) — models with 3.5x and 4.3x more parameters, respectively. This is not a quirky benchmark result. It is an architecture story with concrete deployment implications for developers building data pipelines and edge AI systems.

What LFM 2.5-230M Is #

LFM 2.5-230M is the smallest model in Liquid AI’s LFM 2.5 family. Liquid AI is a Boston-based company spun out of MIT CSAIL, founded by researchers with backgrounds in dynamical systems, signal processing, and robotics. The company raised $250M Series A at a $2.35B valuation with AMD Ventures leading.

The model was pre-trained on 19 trillion tokens and refined through a three-stage pipeline: continual pre-training, supervised fine-tuning, and multi-stage reinforcement learning targeting tool use and structured extraction. It carries a 32,000-token context window — unusually large for a 230M model. Its stated use case is explicit: lightweight data extraction pipelines and agentic tool-calling at the edge.

The Architecture: Not a Transformer, Not Mamba #

The performance gap is explained by architecture. LFM 2.5-230M is built on the LFM2 design — a hybrid that combines gated short-range convolutions with a minority of Grouped Query Attention (GQA) layers.

Standard transformers carry a KV cache that grows with context length (O(n) memory). On a Raspberry Pi or mobile device with constrained RAM, that scaling behavior is a hard wall. Pure state-space models like Mamba avoid the KV cache entirely but lose long-range coherence. LFM2 splits the difference: convolution layers handle local sequence mixing at O(1) per-step decode cost, while a small number of GQA blocks preserve long-range interaction without ballooning memory usage.

The architecture layout was not hand-tuned — it was found via a hardware-in-the-loop search that optimized for quality under strict speed and memory budgets. The LFM2 technical report on arXiv covers the full methodology. The 42 tokens-per-second result on a Raspberry Pi 5 is not incidental — it is what happens when architecture constraints are designed around the target hardware from the start.

Benchmarks: Where It Wins, Where It Does Not #

LFM 2.5-230M is a specialist. The benchmarks reflect that clearly.

Model	Parameters	CaseReportBench	BFCLv3 (Tool Use)	IFEval
LFM 2.5-230M	230M	22.51	43.26	71.71
IBM Granite 4.0	350M	—	39.58	—
Qwen3.5-0.8B	800M	13.83	—	—
Gemma 3 1B	1,000M	2.28	16.61	63.49

On MMLU-Pro, the picture flips: Qwen3.5-0.8B scores 37.42 against LFM 2.5-230M’s 20.25. General knowledge reasoning is not this model’s territory. If you need a general-purpose assistant, use something else. If you need structured output from a data pipeline running on constrained hardware, the benchmark case is strong.

Deployment: One Command to Start #

The full local inference stack has day-one support. You can run it right now via Ollama:

ollama run hf.co/LiquidAI/LFM2.5-230M-Instruct-GGUF:Q4_K_M

GGUF checkpoints are available for llama.cpp directly, with vLLM and MLX (Apple Silicon) support shipping on the same day. LM Studio and Jan offer GUI options. At Q4 quantization, the model runs under 1GB of RAM — it fits on a Raspberry Pi 5 4GB with room left for your application stack.

Hardware inference speeds from the official release: 213 tok/s on a Samsung Galaxy S25 Ultra, 42 tok/s on a Raspberry Pi 5. That second figure is interactive speed. A background extraction daemon running on a Pi at 42 tok/s is a real production option, not a proof of concept. Model checkpoints are available on HuggingFace, with community GGUF variants packaged by Unsloth on the same day.

Licensing #

LFM 2.5-230M is free for individuals and organizations with under $10 million in annual revenue. Organizations above that threshold require an enterprise license. Both instruct and base checkpoints are available. If you are using community GGUF variants, verify the licensing terms before commercial deployment — they may differ from the official release.

What This Signals #

Gartner projects SLM usage will surpass LLM usage by 2027. That forecast has been circulating for a while. What LFM 2.5-230M adds is not just another small transformer — it is evidence that non-transformer architectures can outperform larger transformers on specific tasks while running on hardware that most AI infrastructure discussions ignore entirely.

The question for developers in 2026 is not whether large models are capable — they are. It is whether your data extraction pipeline actually requires a billion-parameter transformer, or whether a 230M hybrid running at 42 tok/s on a $35 board covers the work at a fraction of the cost. Based on these benchmarks, that question deserves a serious answer.

source & further reading

byteiota.com — original article Multi-Provider AI Gateway: Build It Before the Next Ban Engineering Jobs: The AI Resilience Data No One Expected Reflection AI’s $6.3B SpaceX Bet: Open-Source Frontier AI

~/api · this article 200

$curl api.wpnews.pro/v1/news/liquid-ai-lfm-2-5-230m-2…

Read original on byteiota.com → byteiota.com/liquid-ai-lfm25-230m-edge-ai/

mentioned entities

Liquid AI

MIT CSAIL

AMD Ventures

Alibaba

Google

Qwen3.5-0.8B

Gemma 3 1B

Raspberry Pi 5

metadata

slugliquid-ai-lfm-2-5-230m-230m-model-beats-1b-transformer-on-edge

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalbyteiota.com

navigation

← prevShow HN: Brytlog – AI logger

next →Sizing a Mac mini M4 for Local A…

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 27 Jun · #artificial-intelligence

Hunting Digital Chameleons: How We Defeated Botnets in Laravel v2.4.0

dev.to · 27 Jun · #artificial-intelligence

I Built DevBrand AI with Google AI Studio

cryptobriefing.com · 25 Jun · #artificial-intelligence

Liquid AI releases LFM2.5-230M model, outperforming larger competitors in data extraction

dev.to · 27 Jun · #artificial-intelligence

DESIGN.md vs tokens.json vs Figma for AI Agents

── more on @liquid ai 3 stories trending now

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

wpnews · 26 Jun · #large-language-models

The Wrapper Got Heavy: Why ChatGPT Clones Are Runtime Problems Now

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required