{"slug": "accesslens-a-blind-person-s-lanyard-powered-by-gemma-4-on-device", "title": "AccessLens — a blind person's lanyard, powered by Gemma 4 on-device", "summary": "**Summary:** AccessLens is an Android app designed for blind and low-vision users that turns a Pixel 8 phone worn on a lanyard into a persistent, on-device visual interpreter. Unlike cloud-dependent apps like Be My Eyes, it runs entirely offline using Google's Gemma 4 E2B model, automatically describing the user's surroundings when they stop walking and recognizing enrolled faces via MediaPipe without sharing images with the AI. The app also features encrypted local memory that compiles daily and weekly summaries, allowing the model to reference past context for more informed descriptions.", "body_md": "This is a submission for the Gemma 4 Challenge: Build with Gemma 4\nAccessLens is an Android app that turns a Pixel 8 worn on a lanyard into a persistent visual interpreter for blind and low-vision users. Rear camera forward, bone-conduction headphones in, the phone describes the world — and remembers.\nThe problem with existing visual-assist apps (Be My Eyes, Seeing AI, Envision) is that they are screen-bound, stateless, and cloud-bound. A blind person navigates by sound; an app that needs you to hold up a phone, tap a screen, and wait on a datacenter interrupts that signal stream. AccessLens is different on three axes:\nSettleTrigger\nalso fires a description automatically when the user stops walking.SessionEvent\nto a SQLCipher database. A nightly Gemma 4 worker compresses each day into a DailySummary\n; Sundays roll into a WeeklyMemory\n. LONG-press prompts splice that history into the Gemma call, so the model has a world model of this specific apartment, this specific day.SecureRandom\nDB key) protect everything at rest. A SelfTest\non first launch opens a probe DB with the wrong key and asserts the read fails before the app reports encryption healthy.Face recognition uses MediaPipe FaceLandmarker to produce a 192-dim L2-normalized landmark vector per enrolled person. At identify time, cosine-similar matches inject only the names into the Gemma prompt — Gemma never sees a face crop or an embedding, code-review-verified.\nThree gestures, three target latencies (Pixel 8, Tensor G3): SINGLE ≤14 s end-to-end, DOUBLE scales with text length, LONG adds memory retrieval. Voice fillers (\"I'm looking…\", \"Still looking…\") cover the prefill gap so the user hears acoustic progress, not dead air. Everything runs with airplane mode on after the model is pushed once.\nAn always-on, on-device visual interpreter for blind and low-vision users — built for the DEV.to \"Build with Gemma 4\" challenge.\nA phone worn on a lanyard becomes the user's \"eyes.\" The rear camera is always on; the gyroscope watches for motion. When the user stops walking, AccessLens describes what's in front of them. When a friend whose face has been enrolled walks into frame, the phone says their name. When the user wants to read what's in front of them, they press Volume Up; for a richer description of the room, Volume Down. Bluetooth bone-conduction headphones carry the audio — the user's ears stay free for the world.\nWhat separates AccessLens from existing apps like Be My Eyes, Seeing AI, and Envision is persistent on-device memory + 100% on-device inference. Existing tools are stateless and cloud-bound. AccessLens runs Gemma 4 E2B locally via LiteRT-LM, encrypts…\nApache 2.0. The repo includes the full Kotlin/Compose source, the encryption self-test, the nightly compression WorkManager job, and a README documenting which file enforces each of the six privacy invariants.\nReference implementation that taught me the LiteRT-LM API: google-ai-edge/gallery — adapted patterns are cited inline in inference/LiteRtLmRuntime.kt\n.\nModel: Gemma 4 E2B (litert-community/gemma-4-E2B-it-litert-lm\n, ~2.59 GB int4), loaded once at service start via LiteRT-LM 0.12.0 with Backend.GPU()\nfor the vision adapter. Three reasons E2B was the right fit:\nMultimodal in one model, on-device. Image input goes in as Content.ImageBytes\n, text as Content.Text\n, in that order (per the Gallery's \"for accurate last token\" comment), all through one Engine.generate\ncall. No separate vision encoder + decoder to stitch, no second model to keep resident. That fits the latency budget and the memory budget on Pixel-class 8 GB RAM.\nE2B is the smallest competent multimodal Gemma 4. It fits in RAM alongside MediaPipe FaceLandmarker, a CameraX pipeline, and the Compose UI without OOM-ing on a Pixel 8. I prototyped against E4B (the brief's \"quality path\") and measured the latency lift on one-sentence scene descriptions — not worth doubling the prefill cost for a use case where the user is waiting in real time, lanyard-mounted, with no screen feedback. The architecture is parametric on the model path (InferenceRuntime.load(modelPath, Modality)\n), so a future LONG-press branch could swap to E4B in one line. I documented the tradeoff in the README and shipped E2B for all three gestures.\nGemma is the only practical way to do nightly memory compression on-device. The 03:00 CompressionWorker\ncalls Gemma in JSON mode to compress the day's SessionEvent\nrows into a single DailySummary\n, and on Sundays into a WeeklyMemory\n. That's a real LLM task — extracting persistent facts, deduplicating recurring observations, distinguishing \"the blue mug is mine\" from \"I saw a blue mug today\" — and it has to happen without a network. E2B handles it in under a minute per day on Tensor G3 while the phone is on the charger.\nTwo production fixes the brief didn't cover, in case they help someone else:\nvision_litert_compiled_model_executor.cc:273\non Tensor G3.<uses-native-library>\ndeclarations for libOpenCL.so\n, libOpenCL-car.so\n, libOpenCL-pixel.so\n(all android:required=\"false\"\n). Without them, Android 12+ silently denies GPU OpenCL access and the vision backend fails to initialize. Documented at ai.google.dev/edge/litert-lm/android.The thing I'm proudest of: when you uninstall AccessLens, the KeyStore wrapping key is destroyed with it. The encrypted DB on disk becomes cryptographically unrecoverable. The user can throw the phone away and their memories — kitchen layout, friends' faces, places they've been — go with it. That's what on-device privacy is supposed to mean, and Gemma 4 + LiteRT-LM made it possible without compromising the assistant on quality.", "url": "https://wpnews.pro/news/accesslens-a-blind-person-s-lanyard-powered-by-gemma-4-on-device", "canonical_source": "https://dev.to/hassan_shah_733ea1eb37c88/accesslens-a-blind-persons-lanyard-powered-by-gemma-4-on-device-3l8b", "published_at": "2026-05-23 17:27:35+00:00", "updated_at": "2026-05-23 17:31:24.271236+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "products", "hardware"], "entities": ["Gemma 4", "AccessLens", "Pixel 8", "MediaPipe", "SQLCipher", "Tensor G3"], "alternates": {"html": "https://wpnews.pro/news/accesslens-a-blind-person-s-lanyard-powered-by-gemma-4-on-device", "markdown": "https://wpnews.pro/news/accesslens-a-blind-person-s-lanyard-powered-by-gemma-4-on-device.md", "text": "https://wpnews.pro/news/accesslens-a-blind-person-s-lanyard-powered-by-gemma-4-on-device.txt", "jsonld": "https://wpnews.pro/news/accesslens-a-blind-person-s-lanyard-powered-by-gemma-4-on-device.jsonld"}}