# CrankGPT Demonstrates Offline Hand-Cranked LLM Assistant

> Source: <https://letsdatascience.com/news/crankgpt-demonstrates-offline-hand-cranked-llm-assistant-38ac8026>
> Published: 2026-06-18 14:54:43.312088+00:00

# CrankGPT Demonstrates Offline Hand-Cranked LLM Assistant

CrankGPT is a fully offline, hand-cranked AI voice assistant built by Squeez Labs, covered by Gizmodo, Boing Boing, TechRadar, and The Register. The prototype pairs a Raspberry Pi 5 with 8GB RAM and a 20W hand-cranked generator; a custom capacitor board smooths the generator output and provides about 20 seconds of reserve power to prevent brownouts during CPU-intensive inference, per the official Squeez Labs project documentation. The system runs llama.cpp on Liquid AI LFM2 models at 350M and 1.2B parameters, and Google's Gemma 3 at 1B parameters, with Moonshine for on-device speech recognition and Piper for text-to-speech, all fully local. Boot-to-conversation takes roughly 30 seconds; time-to-first-token ranges from under a second on the 350M model to about three seconds on the 1B models, per Squeez Labs benchmarks.

### What happened

**CrankGPT** is a fully offline, hand-powered AI voice assistant built by **Squeez Labs** and demonstrated in a public project writeup. The build has been covered by Gizmodo, Boing Boing, TechRadar, The Register, and Hackster.io. The standard prototype is a **Raspberry Pi 5** with **8GB of RAM** paired with a **20W hand-cranked generator** and a custom capacitor board that smooths the generator's output and holds roughly **20 seconds of reserve power** to prevent the Pi from browning out during peak CPU draw, per the official Squeez Labs project documentation. The device boots in about **30 seconds** from the first crank: roughly 10-15 seconds of Pi 5 firmware sequence, 3 seconds of Linux boot via DietPi, and 10-15 seconds for the voice agent to load model weights (Squeez Labs).

### Technical stack

Per the official Squeez Labs project documentation, inference runs on llama.cpp using **Liquid AI LFM2** models at **350M** or **1.2B** parameters as the primary general-purpose voice agent, with **Google Gemma 3** at **1B parameters** as a secondary option. Speech recognition uses **Moonshine** ASR with Silero VAD for endpointing, chosen for its low CPU latency over Whisper-base-sized alternatives. Text-to-speech runs on **Piper**, which synthesizes a 20-word test utterance in roughly half a second on the Pi 5, the only contender that keeps pace with streaming LLM output in real time (Squeez Labs). All components run on ONNX Runtime; PyTorch dependencies were removed to save RAM and improve startup. The OS is **DietPi**, a stripped-down Debian image that cuts Linux boot time to around 3 seconds.

### Performance and power

Squeez Labs' latency benchmarks show time-to-first-token of **~0.8 seconds** (LFM2 350M), **~1.5 seconds** (LFM2 1.2B), and **~2.9 seconds** (Gemma3 1B). Power draw peaks at roughly **15W** during LLM and TTS inference combined; peak current spikes of up to **5A** are what triggered the custom capacitor board design (Squeez Labs). Memory bandwidth is the binding constraint for token generation rates: an Orange Pi 5 Pro with DDR5 RAM produces 29-58% higher generation rates than the Pi 5 with DDR4, per Squeez Labs benchmarks.

### Industry context

Industry-pattern analysis: CrankGPT sits at the intersection of two applied ML trends. First, runtime and quantization toolchains such as llama.cpp with Q4_K_M quants make sub-2B-parameter inference practical on CPU-only single-board hardware. Second, a growing privacy and resilience narrative around fully local inference is motivating careful engineering trade-offs around power smoothing, OS footprint, ASR/TTS latency budgets, and memory bandwidth. The Squeez Labs writeup is notable for its specificity: the team published benchmark tables, a schematic, and a component bill of materials, making the project reproducible.

### Significance and limitations

The demonstration shows that useful conversational AI can run on a device with no battery, no cloud, and no accelerator, but the scope is narrow. Sub-2B parameter models offer constrained context windows and reduced breadth of knowledge compared with large cloud-hosted models; CrankGPT is best read as a proof of concept for resilience, privacy, and extreme low-power edge use cases. Squeez Labs notes that faster memory bandwidth and continued model efficiency improvements will push the feasible edge further down in device cost and power over time.

### Quote from project documentation

"Provided the electronics are kept dry and at a reasonable temperature, there's no reason this thing won't still work in a hundred years, though you'll definitely need a fresh SD card" (Squeez Labs, CrankGPT project page).

## Scoring Rationale

CrankGPT is a notable edge-ML proof of concept backed by detailed published engineering work including benchmark tables, schematics, and a full software stack writeup, making it reproducible and practically relevant for practitioners targeting offline or low-power inference. It is a niche demonstration project rather than a model release or infrastructure shift, placing it solidly in the mid-Solid tier.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

[Try 250 free problems](/problems)