{"slug": "crankgpt-demonstrates-offline-hand-cranked-llm-assistant", "title": "CrankGPT Demonstrates Offline Hand-Cranked LLM Assistant", "summary": "Squeez Labs demonstrated CrankGPT, a fully offline, hand-cranked AI voice assistant built on a Raspberry Pi 5 with 8GB RAM and a 20W generator, running local LLMs via llama.cpp. The device boots in 30 seconds and achieves sub-second to three-second time-to-first-token on models up to 1.2B parameters, highlighting the feasibility of private, cloud-free conversational AI on low-power hardware.", "body_md": "# CrankGPT Demonstrates Offline Hand-Cranked LLM Assistant\n\nCrankGPT is a fully offline, hand-cranked AI voice assistant built by Squeez Labs, covered by Gizmodo, Boing Boing, TechRadar, and The Register. The prototype pairs a Raspberry Pi 5 with 8GB RAM and a 20W hand-cranked generator; a custom capacitor board smooths the generator output and provides about 20 seconds of reserve power to prevent brownouts during CPU-intensive inference, per the official Squeez Labs project documentation. The system runs llama.cpp on Liquid AI LFM2 models at 350M and 1.2B parameters, and Google's Gemma 3 at 1B parameters, with Moonshine for on-device speech recognition and Piper for text-to-speech, all fully local. Boot-to-conversation takes roughly 30 seconds; time-to-first-token ranges from under a second on the 350M model to about three seconds on the 1B models, per Squeez Labs benchmarks.\n\n### What happened\n\n**CrankGPT** is a fully offline, hand-powered AI voice assistant built by **Squeez Labs** and demonstrated in a public project writeup. The build has been covered by Gizmodo, Boing Boing, TechRadar, The Register, and Hackster.io. The standard prototype is a **Raspberry Pi 5** with **8GB of RAM** paired with a **20W hand-cranked generator** and a custom capacitor board that smooths the generator's output and holds roughly **20 seconds of reserve power** to prevent the Pi from browning out during peak CPU draw, per the official Squeez Labs project documentation. The device boots in about **30 seconds** from the first crank: roughly 10-15 seconds of Pi 5 firmware sequence, 3 seconds of Linux boot via DietPi, and 10-15 seconds for the voice agent to load model weights (Squeez Labs).\n\n### Technical stack\n\nPer the official Squeez Labs project documentation, inference runs on llama.cpp using **Liquid AI LFM2** models at **350M** or **1.2B** parameters as the primary general-purpose voice agent, with **Google Gemma 3** at **1B parameters** as a secondary option. Speech recognition uses **Moonshine** ASR with Silero VAD for endpointing, chosen for its low CPU latency over Whisper-base-sized alternatives. Text-to-speech runs on **Piper**, which synthesizes a 20-word test utterance in roughly half a second on the Pi 5, the only contender that keeps pace with streaming LLM output in real time (Squeez Labs). All components run on ONNX Runtime; PyTorch dependencies were removed to save RAM and improve startup. The OS is **DietPi**, a stripped-down Debian image that cuts Linux boot time to around 3 seconds.\n\n### Performance and power\n\nSqueez Labs' latency benchmarks show time-to-first-token of **~0.8 seconds** (LFM2 350M), **~1.5 seconds** (LFM2 1.2B), and **~2.9 seconds** (Gemma3 1B). Power draw peaks at roughly **15W** during LLM and TTS inference combined; peak current spikes of up to **5A** are what triggered the custom capacitor board design (Squeez Labs). Memory bandwidth is the binding constraint for token generation rates: an Orange Pi 5 Pro with DDR5 RAM produces 29-58% higher generation rates than the Pi 5 with DDR4, per Squeez Labs benchmarks.\n\n### Industry context\n\nIndustry-pattern analysis: CrankGPT sits at the intersection of two applied ML trends. First, runtime and quantization toolchains such as llama.cpp with Q4_K_M quants make sub-2B-parameter inference practical on CPU-only single-board hardware. Second, a growing privacy and resilience narrative around fully local inference is motivating careful engineering trade-offs around power smoothing, OS footprint, ASR/TTS latency budgets, and memory bandwidth. The Squeez Labs writeup is notable for its specificity: the team published benchmark tables, a schematic, and a component bill of materials, making the project reproducible.\n\n### Significance and limitations\n\nThe demonstration shows that useful conversational AI can run on a device with no battery, no cloud, and no accelerator, but the scope is narrow. Sub-2B parameter models offer constrained context windows and reduced breadth of knowledge compared with large cloud-hosted models; CrankGPT is best read as a proof of concept for resilience, privacy, and extreme low-power edge use cases. Squeez Labs notes that faster memory bandwidth and continued model efficiency improvements will push the feasible edge further down in device cost and power over time.\n\n### Quote from project documentation\n\n\"Provided the electronics are kept dry and at a reasonable temperature, there's no reason this thing won't still work in a hundred years, though you'll definitely need a fresh SD card\" (Squeez Labs, CrankGPT project page).\n\n## Scoring Rationale\n\nCrankGPT is a notable edge-ML proof of concept backed by detailed published engineering work including benchmark tables, schematics, and a full software stack writeup, making it reproducible and practically relevant for practitioners targeting offline or low-power inference. It is a niche demonstration project rather than a model release or infrastructure shift, placing it solidly in the mid-Solid tier.\n\nPractice interview problems based on real data\n\n1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.\n\n[Try 250 free problems](/problems)", "url": "https://wpnews.pro/news/crankgpt-demonstrates-offline-hand-cranked-llm-assistant", "canonical_source": "https://letsdatascience.com/news/crankgpt-demonstrates-offline-hand-cranked-llm-assistant-38ac8026", "published_at": "2026-06-18 14:54:43.312088+00:00", "updated_at": "2026-06-18 14:54:45.519129+00:00", "lang": "en", "topics": ["large-language-models", "ai-products", "ai-infrastructure", "ai-ethics", "ai-research"], "entities": ["Squeez Labs", "Raspberry Pi 5", "Liquid AI", "Google", "Moonshine", "Piper", "DietPi", "llama.cpp"], "alternates": {"html": "https://wpnews.pro/news/crankgpt-demonstrates-offline-hand-cranked-llm-assistant", "markdown": "https://wpnews.pro/news/crankgpt-demonstrates-offline-hand-cranked-llm-assistant.md", "text": "https://wpnews.pro/news/crankgpt-demonstrates-offline-hand-cranked-llm-assistant.txt", "jsonld": "https://wpnews.pro/news/crankgpt-demonstrates-offline-hand-cranked-llm-assistant.jsonld"}}