cd /news/large-language-models/crankgpt-demonstrates-offline-hand-c… · home topics large-language-models article
[ARTICLE · art-32744] src=letsdatascience.com ↗ pub= topic=large-language-models verified=true sentiment=↑ positive

CrankGPT Demonstrates Offline Hand-Cranked LLM Assistant

Squeez Labs demonstrated CrankGPT, a fully offline, hand-cranked AI voice assistant built on a Raspberry Pi 5 with 8GB RAM and a 20W generator, running local LLMs via llama.cpp. The device boots in 30 seconds and achieves sub-second to three-second time-to-first-token on models up to 1.2B parameters, highlighting the feasibility of private, cloud-free conversational AI on low-power hardware.

read4 min views1 publishedJun 18, 2026

CrankGPT is a fully offline, hand-cranked AI voice assistant built by Squeez Labs, covered by Gizmodo, Boing Boing, TechRadar, and The Register. The prototype pairs a Raspberry Pi 5 with 8GB RAM and a 20W hand-cranked generator; a custom capacitor board smooths the generator output and provides about 20 seconds of reserve power to prevent brownouts during CPU-intensive inference, per the official Squeez Labs project documentation. The system runs llama.cpp on Liquid AI LFM2 models at 350M and 1.2B parameters, and Google's Gemma 3 at 1B parameters, with Moonshine for on-device speech recognition and Piper for text-to-speech, all fully local. Boot-to-conversation takes roughly 30 seconds; time-to-first-token ranges from under a second on the 350M model to about three seconds on the 1B models, per Squeez Labs benchmarks.

What happened

CrankGPT is a fully offline, hand-powered AI voice assistant built by Squeez Labs and demonstrated in a public project writeup. The build has been covered by Gizmodo, Boing Boing, TechRadar, The Register, and Hackster.io. The standard prototype is a Raspberry Pi 5 with 8GB of RAM paired with a 20W hand-cranked generator and a custom capacitor board that smooths the generator's output and holds roughly 20 seconds of reserve power to prevent the Pi from browning out during peak CPU draw, per the official Squeez Labs project documentation. The device boots in about 30 seconds from the first crank: roughly 10-15 seconds of Pi 5 firmware sequence, 3 seconds of Linux boot via DietPi, and 10-15 seconds for the voice agent to load model weights (Squeez Labs).

Technical stack

Per the official Squeez Labs project documentation, inference runs on llama.cpp using Liquid AI LFM2 models at 350M or 1.2B parameters as the primary general-purpose voice agent, with Google Gemma 3 at 1B parameters as a secondary option. Speech recognition uses Moonshine ASR with Silero VAD for endpointing, chosen for its low CPU latency over Whisper-base-sized alternatives. Text-to-speech runs on Piper, which synthesizes a 20-word test utterance in roughly half a second on the Pi 5, the only contender that keeps pace with streaming LLM output in real time (Squeez Labs). All components run on ONNX Runtime; PyTorch dependencies were removed to save RAM and improve startup. The OS is DietPi, a stripped-down Debian image that cuts Linux boot time to around 3 seconds.

Performance and power

Squeez Labs' latency benchmarks show time-to-first-token of ~0.8 seconds (LFM2 350M), ~1.5 seconds (LFM2 1.2B), and ~2.9 seconds (Gemma3 1B). Power draw peaks at roughly 15W during LLM and TTS inference combined; peak current spikes of up to 5A are what triggered the custom capacitor board design (Squeez Labs). Memory bandwidth is the binding constraint for token generation rates: an Orange Pi 5 Pro with DDR5 RAM produces 29-58% higher generation rates than the Pi 5 with DDR4, per Squeez Labs benchmarks.

Industry context

Industry-pattern analysis: CrankGPT sits at the intersection of two applied ML trends. First, runtime and quantization toolchains such as llama.cpp with Q4_K_M quants make sub-2B-parameter inference practical on CPU-only single-board hardware. Second, a growing privacy and resilience narrative around fully local inference is motivating careful engineering trade-offs around power smoothing, OS footprint, ASR/TTS latency budgets, and memory bandwidth. The Squeez Labs writeup is notable for its specificity: the team published benchmark tables, a schematic, and a component bill of materials, making the project reproducible.

Significance and limitations

The demonstration shows that useful conversational AI can run on a device with no battery, no cloud, and no accelerator, but the scope is narrow. Sub-2B parameter models offer constrained context windows and reduced breadth of knowledge compared with large cloud-hosted models; CrankGPT is best read as a proof of concept for resilience, privacy, and extreme low-power edge use cases. Squeez Labs notes that faster memory bandwidth and continued model efficiency improvements will push the feasible edge further down in device cost and power over time.

Quote from project documentation

"Provided the electronics are kept dry and at a reasonable temperature, there's no reason this thing won't still work in a hundred years, though you'll definitely need a fresh SD card" (Squeez Labs, CrankGPT project page).

Scoring Rationale #

CrankGPT is a notable edge-ML proof of concept backed by detailed published engineering work including benchmark tables, schematics, and a full software stack writeup, making it reproducible and practically relevant for practitioners targeting offline or low-power inference. It is a niche demonstration project rather than a model release or infrastructure shift, placing it solidly in the mid-Solid tier.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

── more in #large-language-models 4 stories · sorted by recency
── more on @squeez labs 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/crankgpt-demonstrate…] indexed:0 read:4min 2026-06-18 ·