Bringing Gemma 4 E2B to the Edge: Building a Privacy-First Dream Analyzer with Flutter & LiteRT

The article describes the development of Remora, a privacy-focused dream journaling app that uses Google's Gemma 4 E2B model for on-device AI analysis, avoiding cloud uploads. The engineering team faced significant challenges integrating the model into Flutter, including incompatibility with common formats like GGUF and crashes with native audio processing, requiring them to adopt LiteRT packages and build a secure fallback system for voice input. Ultimately, the project demonstrates that effective edge AI requires a fundamentally different deployment architecture than cloud-based AI.

This is a submission for the Gemma 4 Challenge: Write About Gemma 4 Every AI app says it respects your privacy. Then it uploads your most personal data to the cloud. When we started building Remora — a dream journaling and psychological interpretation app — we faced a difficult question: How do you analyze deeply personal subconscious experiences without sending them to a remote server? We wanted users to wake up, record a dream, and receive rich AI-powered analysis directly on their phone. No cloud inference. No persistent uploads. No centralized storage of emotional or psychological data. That requirement immediately ruled out most modern AI architectures. Then we discovered Gemma 4. Its compact E2B footprint, multimodal support, and mobile-first optimization made it uniquely suited for true on-device inference. But integrating cutting-edge local AI into a production Flutter app turned out to be far more challenging than expected. This is the engineering story behind making it work. Most mobile AI today still relies on a thin-client model: That approach breaks down completely for sensitive psychological analysis. Dream journals often contain: We needed: Gemma 4 E2B gave us a realistic path toward all four. Running directly on-device also unlocked: Our first instinct was straightforward: Download a .gguf quantization from Hugging Face and wire it into Flutter. That assumption lasted about five minutes. The moment the engine initialized on Android, the app crashed with: IllegalArgumentException: Unsupported model format: .gguf The open-source ecosystem heavily favors .gguf because of tools like llama.cpp . But Android hardware acceleration operates in a very different ecosystem. Google’s mobile AI stack relies on: That means models must be packaged as: .task .bin .litertlm —not GGUF. Once we switched to the official LiteRT package, memory usage dropped significantly and inference stabilized immediately. FlutterGemma.installModel modelType: ModelType.gemma4, fileType: ModelFileType.litertlm, .fromNetwork 'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it.litertlm', .withProgress progress { print 'Downloading 1.5GB Edge Model: ${progress}%' ; } ; This was our first major realization: Edge AI is not just “smaller cloud AI.” It is an entirely different deployment architecture. One of the most exciting features of Gemma 4 is native multimodal capability. Our goal was simple: Users should be able to: We recorded audio and passed it into the local model. Immediate crash. We switched encoders: .m4a Crash again. Failed to start streaming code: 13 After digging through Google’s AI Edge Gallery implementation, we discovered: In practice: Instead of abandoning voice support, we built a privacy-preserving fallback architecture. If local audio inference fails: That preserved the most sensitive part of the workflow entirely on-device. Future<DreamAnalysisResult analyzeAudio String filePath async { if localEngine.isReady { try { return await localEngine.analyzeAudio filePath ; } catch e { print 'Code 13 detected. Engaging secure fallback.' ; } } final bytes = await File filePath .readAsBytes ; final response = await dio.post '/dreams/transcribe', data: bytes, ; final String text = response.data 'transcription' ; final result = await localEngine.analyzeDream text ; return DreamAnalysisResult title: result.title, interpretation: result.interpretation, tags: ...result.tags, ' voice log' , transcribedText: text, ; } This ended up becoming one of the most important architectural decisions in the app. Not because it was perfect — but because it degraded gracefully while preserving privacy guarantees. During development we tested inference using the Android emulator. Everything failed instantly. Connection closed before full header was received At first we suspected: None of those were the problem. The real issue was architecture mismatch. LiteRT-LM delegates are optimized specifically for: arm64-v8a The x86 emulator environment simply could not execute the delegate stack correctly. Once we moved testing onto a physical Pixel device: That moment changed how we approached mobile AI QA entirely. Edge AI development without real hardware is basically guesswork. Downloading a 1.5GB local model works — but it is not the ideal long-term UX. Large bundled models create: To future-proof the architecture, we integrated Android AI Core support. Before downloading Gemma 4 locally, Remora now checks whether: If available: This creates a hybrid architecture where: Working with Gemma 4 fundamentally changed how we think about mobile apps. For years, mobile AI has largely meant: “Call an API and wait.” But local multimodal models enable something very different: The tooling ecosystem is still early. The documentation is fragmented. The hardware constraints are real. But the direction is obvious. Edge AI is becoming a first-class application platform. And Gemma 4 is one of the first models that genuinely makes that future practical for mobile developers. Remora started as an experiment: Could we build a psychologically meaningful AI experience without compromising user privacy? Thanks to Gemma 4, LiteRT, and Android’s emerging edge AI ecosystem, the answer is increasingly yes. We still have challenges ahead: But for the first time, building truly private multimodal AI apps on smartphones feels achievable. And that changes everything. What challenges you the most in Edge AI journey?