Bringing Gemma 4 E2B to the Edge: Building a Privacy-First Dream Analyzer with Flutter & LiteRT The article describes the development of Remora, a privacy-focused dream journaling app that uses Google's Gemma 4 E2B model for on-device AI analysis, avoiding cloud uploads. The engineering team faced significant challenges integrating the model into Flutter, including incompatibility with common formats like GGUF and crashes with native audio processing, requiring them to adopt LiteRT packages and build a secure fallback system for voice input. Ultimately, the project demonstrates that effective edge AI requires a fundamentally different deployment architecture than cloud-based AI. This is a submission for the Gemma 4 Challenge: Write About Gemma 4 Every AI app says it respects your privacy. Then it uploads your most personal data to the cloud. When we started building Remora — a dream journaling and psychological interpretation app — we faced a difficult question: How do you analyze deeply personal subconscious experiences without sending them to a remote server? We wanted users to wake up, record a dream, and receive rich AI-powered analysis directly on their phone. No cloud inference. No persistent uploads. No centralized storage of emotional or psychological data. That requirement immediately ruled out most modern AI architectures. Then we discovered Gemma 4. Its compact E2B footprint, multimodal support, and mobile-first optimization made it uniquely suited for true on-device inference. But integrating cutting-edge local AI into a production Flutter app turned out to be far more challenging than expected. This is the engineering story behind making it work. Most mobile AI today still relies on a thin-client model: That approach breaks down completely for sensitive psychological analysis. Dream journals often contain: We needed: Gemma 4 E2B gave us a realistic path toward all four. Running directly on-device also unlocked: Our first instinct was straightforward: Download a .gguf quantization from Hugging Face and wire it into Flutter. That assumption lasted about five minutes. The moment the engine initialized on Android, the app crashed with: IllegalArgumentException: Unsupported model format: .gguf The open-source ecosystem heavily favors .gguf because of tools like llama.cpp . But Android hardware acceleration operates in a very different ecosystem. Google’s mobile AI stack relies on: That means models must be packaged as: .task .bin .litertlm —not GGUF. Once we switched to the official LiteRT package, memory usage dropped significantly and inference stabilized immediately. FlutterGemma.installModel modelType: ModelType.gemma4, fileType: ModelFileType.litertlm, .fromNetwork 'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it.litertlm', .withProgress progress { print 'Downloading 1.5GB Edge Model: ${progress}%' ; } ; This was our first major realization: Edge AI is not just “smaller cloud AI.” It is an entirely different deployment architecture. One of the most exciting features of Gemma 4 is native multimodal capability. Our goal was simple: Users should be able to: We recorded audio and passed it into the local model. Immediate crash. We switched encoders: .m4a Crash again. Failed to start streaming code: 13 After digging through Google’s AI Edge Gallery implementation, we discovered: In practice: Instead of abandoning voice support, we built a privacy-preserving fallback architecture. If local audio inference fails: That preserved the most sensitive part of the workflow entirely on-device. Future