Show HN: Recreate Thinking Machines 276B voice demo with duct tape and 8B model A developer has built a CPU-only voice agent that replicates four key behaviors from Thinking Machines' 276B-parameter Interaction Models demo using off-the-shelf parts and commodity AI models on a single laptop. The project, which runs on one Python asyncio loop with local speech and vision processing, demonstrates friend detection, live translation, slouch detection, and search with chart generation by gluing together models like YOLO11, Silero VAD, and Llama-3.1-8B-Instruct-Turbo. The work shows how close a careful software harness can get to matching a massive custom-trained model's surface behaviors without requiring specialized hardware or training. A CPU-only voice agent that replicates the surface behaviors of Thinking Machines' Interaction Models demo May 2026 — real-time speech, vision-keyed proactivity, live translation, mid-conversation background tasks — on a laptop, with off-the-shelf parts and minimal LLM calls. The point isn't to match Thinking Machines' architecture. They trained a 276B MoE from scratch on continuous audio+video with 200ms micro-turns. This project glues commodity models together with a Python event loop and shows how close a careful harness can get on the four behaviors that demo highlighted. Speech and vision are local Silero VAD, Kroko ASR, YOLO11 pose, Piper TTS ; LLM calls go to DeepInfra Llama-3.1-8B-Instruct-Turbo for the foreground, DeepSeek-V3.2 for structured background work . One CPU laptop, one process, one asyncio loop. The four demo behaviors all run end to end on a real laptop with a real webcam and mic: Friend detection — YOLO11-pose on the webcam emits person count changed ; the registered watcher fires on the non-primary person. Live translation — Silero VAD cuts phrase-sized chunks, Whisper-large-v3-turbo on DeepInfra translates them to English, Piper speaks each chunk interpreter-style over the user. Exit is automatic when the user speaks English on an end-of-turn pause. Slouch detection — shoulder→ear vector angle off vertical, debounced over three frames 1.5s so a momentary lean doesn't fire. Search + chart with continued conversation — the foreground says "let me find those for you" while a background worker calls Serper, then DeepSeek-V3.2 for a Chart.js spec. The user can interrupt and ask follow-ups while the chart renders in the browser. Pass --no-cam to skip the camera and the YOLO load entirely; vision-keyed triggers stay in the table but don't fire automatically VisionWorker.push event still works for scripted demos . --no-audio runs from stdin without touching mic or TTS. --no-audio --no-cam together gives a headless pure-text session, which is what the integration tests use. flowchart TB O "