CrankGPT Demonstrates Offline Hand-Cranked LLM Assistant

wpnews.pro

cd /news/large-language-models/crankgpt-demonstrates-offline-hand-c… · home › topics › large-language-models › article

[ARTICLE · art-32744] src=letsdatascience.com ↗ pub=2026-06-18T14:54Z topic=large-language-models verified=true sentiment=↑ positive

CrankGPT Demonstrates Offline Hand-Cranked LLM Assistant

Squeez Labs demonstrated CrankGPT, a fully offline, hand-cranked AI voice assistant built on a Raspberry Pi 5 with 8GB RAM and a 20W generator, running local LLMs via llama.cpp. The device boots in 30 seconds and achieves sub-second to three-second time-to-first-token on models up to 1.2B parameters, highlighting the feasibility of private, cloud-free conversational AI on low-power hardware.

read4 min views35 publishedJun 18, 2026

CrankGPT is a fully offline, hand-cranked AI voice assistant built by Squeez Labs, covered by Gizmodo, Boing Boing, TechRadar, and The Register. The prototype pairs a Raspberry Pi 5 with 8GB RAM and a 20W hand-cranked generator; a custom capacitor board smooths the generator output and provides about 20 seconds of reserve power to prevent brownouts during CPU-intensive inference, per the official Squeez Labs project documentation. The system runs llama.cpp on Liquid AI LFM2 models at 350M and 1.2B parameters, and Google's Gemma 3 at 1B parameters, with Moonshine for on-device speech recognition and Piper for text-to-speech, all fully local. Boot-to-conversation takes roughly 30 seconds; time-to-first-token ranges from under a second on the 350M model to about three seconds on the 1B models, per Squeez Labs benchmarks.

What happened

CrankGPT is a fully offline, hand-powered AI voice assistant built by Squeez Labs and demonstrated in a public project writeup. The build has been covered by Gizmodo, Boing Boing, TechRadar, The Register, and Hackster.io. The standard prototype is a Raspberry Pi 5 with 8GB of RAM paired with a 20W hand-cranked generator and a custom capacitor board that smooths the generator's output and holds roughly 20 seconds of reserve power to prevent the Pi from browning out during peak CPU draw, per the official Squeez Labs project documentation. The device boots in about 30 seconds from the first crank: roughly 10-15 seconds of Pi 5 firmware sequence, 3 seconds of Linux boot via DietPi, and 10-15 seconds for the voice agent to load model weights (Squeez Labs).

Technical stack

Per the official Squeez Labs project documentation, inference runs on llama.cpp using Liquid AI LFM2 models at 350M or 1.2B parameters as the primary general-purpose voice agent, with Google Gemma 3 at 1B parameters as a secondary option. Speech recognition uses Moonshine ASR with Silero VAD for endpointing, chosen for its low CPU latency over Whisper-base-sized alternatives. Text-to-speech runs on Piper, which synthesizes a 20-word test utterance in roughly half a second on the Pi 5, the only contender that keeps pace with streaming LLM output in real time (Squeez Labs). All components run on ONNX Runtime; PyTorch dependencies were removed to save RAM and improve startup. The OS is DietPi, a stripped-down Debian image that cuts Linux boot time to around 3 seconds.

Performance and power

Squeez Labs' latency benchmarks show time-to-first-token of ~0.8 seconds (LFM2 350M), ~1.5 seconds (LFM2 1.2B), and ~2.9 seconds (Gemma3 1B). Power draw peaks at roughly 15W during LLM and TTS inference combined; peak current spikes of up to 5A are what triggered the custom capacitor board design (Squeez Labs). Memory bandwidth is the binding constraint for token generation rates: an Orange Pi 5 Pro with DDR5 RAM produces 29-58% higher generation rates than the Pi 5 with DDR4, per Squeez Labs benchmarks.

Industry context

Industry-pattern analysis: CrankGPT sits at the intersection of two applied ML trends. First, runtime and quantization toolchains such as llama.cpp with Q4_K_M quants make sub-2B-parameter inference practical on CPU-only single-board hardware. Second, a growing privacy and resilience narrative around fully local inference is motivating careful engineering trade-offs around power smoothing, OS footprint, ASR/TTS latency budgets, and memory bandwidth. The Squeez Labs writeup is notable for its specificity: the team published benchmark tables, a schematic, and a component bill of materials, making the project reproducible.

Significance and limitations

The demonstration shows that useful conversational AI can run on a device with no battery, no cloud, and no accelerator, but the scope is narrow. Sub-2B parameter models offer constrained context windows and reduced breadth of knowledge compared with large cloud-hosted models; CrankGPT is best read as a proof of concept for resilience, privacy, and extreme low-power edge use cases. Squeez Labs notes that faster memory bandwidth and continued model efficiency improvements will push the feasible edge further down in device cost and power over time.

Quote from project documentation

"Provided the electronics are kept dry and at a reasonable temperature, there's no reason this thing won't still work in a hundred years, though you'll definitely need a fresh SD card" (Squeez Labs, CrankGPT project page).

Scoring Rationale #

CrankGPT is a notable edge-ML proof of concept backed by detailed published engineering work including benchmark tables, schematics, and a full software stack writeup, making it reproducible and practically relevant for practitioners targeting offline or low-power inference. It is a niche demonstration project rather than a model release or infrastructure shift, placing it solidly in the mid-Solid tier.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

source & further reading

letsdatascience.com — original article Sydney Trial Finds AI-Supported Orthodontic Triage 2.9 Times Faster EU Article 50 Sets AI Disclosure Rules and Exceptions EU enforcement powers over general-purpose AI providers take effect

~/api · this article 200

$curl api.wpnews.pro/v1/news/crankgpt-demonstrates-of…

Read original on letsdatascience.com → letsdatascience.com/news/crankgpt-demonstrates-o…

mentioned entities

Squeez Labs

Raspberry Pi 5

Liquid AI

Google

Moonshine

Piper

DietPi

llama.cpp

metadata

slugcrankgpt-demonstrates-offline-hand-cranked-llm-assistant

topic#large-language-models

secondary4 topics

sentimentpositive

canonicalletsdatascience.com

navigation

← prevNaftali Bennett Forms National A…

next →offset_of! slices

── more in #large-language-models 4 stories · sorted by recency

androidauthority.com · 3 Aug · #large-language-models

New Pixel 11 Pro leak highlights camera upgrades and more

9to5google.com · 3 Aug · #large-language-models

Matte black Pixel 11 Pro shown off in detailed images [Gallery]

zdnet.com · 3 Aug · #large-language-models

How to keep your conversations with ChatGPT, Gemini, Copilot or Claude as private as possible

0xsid.com · 3 Aug · #large-language-models

DeepSeek v4 flash and cheap intelligence

── more on @squeez labs 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required