Gizmo Guard - Safeguard Bot (Powered by Gemma4) The Gizmo Guard is a privacy-focused, edge-based AI safety bot that uses a Raspberry Pi and ArduCam to monitor workspaces for scene changes, such as a moved mug. When motion is detected, it captures evidence images and sends them to a Spring Boot backend, which uses a locally running Gemma 4 model for multimodal image reasoning and natural-language explanations. The system is designed to run affordably without cloud infrastructure, ensuring all camera data remains local. This is a submission for the Gemma 4 Challenge: Build with Gemma 4 GizmoGuard is a low-budget, privacy-first AI-at-the-edge personal safety and monitoring bot powered by locally running Gemma models. The idea started from a simple but relatable problem: “Who moved my mug?” GizmoGuard continuously monitors a workspace — or any valuable object of interest, indoors or outdoors — using an ArduCam attached to a Raspberry Pi. The system detects scene changes such as: The system is designed to intelligently distinguish between normal environmental activity and a real scene change near the protected object. When motion or scene changes are detected, GizmoGuard captures “evidence images” and sends them to a Spring Boot backend API. The backend then uses Gemma 4 for multimodal image reasoning and natural-language explanations. Using additional preconfigured contextual information, the system can also: The entire system is built around a local-first AI architecture: GizmoGuard demonstrates how compact multimodal AI models like Gemma can power practical, privacy-focused real-world edge AI applications. The current GizmoGuard architecture consists of the following components: The Spring Boot backend acts as the orchestration layer and: Powered by Gemma 4, the AI layer: The project demonstrates how practical multimodal AI systems can run locally using affordable hardware — without requiring expensive cloud infrastructure or hosted AI services. Demo Link: Gizmo-Guard Bot Demo Demo Includes Mug placed on desk Scene continuously monitored by Raspberry Pi + ArduCam Mug moved, removed, or scene unexpectedly changes Evidence image captured automatically Gemma analyzes the image and explains what changed using multimodal reasoning When real people or images of them appear in the scene: GitHub sasiperi Repo name and Link: gizmo-guard-gemma4-challenge Tech stack includes: GizmoGuard is powered by Gemma 4B Quantized gemma4:4B-Q4 K XL running locally through Docker Model Runner DMR . I specifically selected this model because it delivered the best overall balance between: One of the primary goals of GizmoGuard was ensuring that camera images and personal workspace data never leave the local environment. By running Gemma locally: For an always-on visual monitoring system, this was extremely important. I evaluated several local multimodal models. Some lightweight models were fast but struggled with: Larger models produced strong results but required significantly more resources and slower inference times. gemma4:4B-Q4 K XL turned out to be the ideal middle ground: This made it an excellent fit for AI-at-the-edge workloads. A major advantage of Gemma4:4B was its ability to handle: within a single model. This avoided the need to chain together: Using a unified multimodal model simplified: Another goal of the project was proving that useful AI systems do not require expensive cloud GPUs or recurring API fees. Running Gemma locally means: This makes GizmoGuard practical for: GizmoGuard demonstrates how compact multimodal models like Gemma can power practical real-world edge AI applications using affordable hardware and open-source tooling. The project combines: into a fully working end-to-end system. It showcases how modern multimodal AI can move beyond cloud-only deployments and become useful directly at the edge.