{"slug": "l-e-n-s-a-private-photography-coach-for-blind-and-low-vision-artisans", "title": "L.E.N.S. — A private photography coach for blind and low-vision artisans", "summary": "L.E.N.S. (Local Edge Native Studio) is a voice-guided photography coach designed for blind and low-vision artisans, such as hand-knit sweater makers, to independently verify and improve product photos before listing items online. The application runs Google's Gemma 4 multimodal AI model locally through Ollama, analyzing photos without sending images to the cloud, and provides structured voice feedback one fix at a time. Built as a privacy-focused, offline-first tool, L.E.N.S. allows users to assess focus, lighting, and composition of their product photos without needing assistance from a sighted person.", "body_md": "**L.E.N.S.** (Local Edge Native Studio) is a voice-guided photography coach that runs **Gemma 4 E4B** locally through **Ollama** — so a maker can verify and improve product photos before listing, without sending images to the cloud and without asking someone sighted to “just check this one.”\n\nGemma 4’s **native multimodal vision** is the engine: each coaching turn sends a real product photo (base64 in the Ollama chat) and gets back structured JSON the app validates before speaking.\n\n🔗 **Try it (no install):** [lens-app-gemma4.vercel.app](https://lens-app-gemma4.vercel.app)\n\n📹 **Demo video:** [YouTube walkthrough](https://youtu.be/qoDLKzzcYHM)\n\n💻 **Source:** [github.com/prasadt1/photography-coach-gemma4](https://github.com/prasadt1/photography-coach-gemma4) (Apache 2.0)\n\n## What I Built\n\nI built L.E.N.S. for someone like **Mohan** — a low-vision artisan who hand-knits sweaters to sell online. He can judge the knit by touch: tension, pattern, finish. What he cannot reliably judge is the **photograph** of the piece. Is it in focus? Is the light flat? Is the sweater cropped awkwardly or lost against the background? On a marketplace like Etsy, the photo *is* the product; a weak photo quietly costs the sale. Until now, that step has meant borrowing someone else’s eyes.\n\nL.E.N.S. closes that gap.\n\n- The maker points their camera and takes a photo.\n-\n**Gemma 4 E4B**— on their own machine, via Ollama — assesses framing, lighting, focus, and composition from the** image itself**(multimodal input, not a text-only description). - L.E.N.S. speaks back\n**one** specific, actionable fix: not “this photo is bad,” but “move back about six inches” or “the light is behind the sweater — turn toward the window.” - They take a second photo; L.E.N.S.\n**compares** the two images out loud and says which is stronger and why. - It drafts\n**copy-ready listing text**— title, description, and alt-text — ready to paste into their store.\n\nIt is **voice-first by design**, not a visual UI with audio bolted on. I built and tested the flow with a screen reader on and the screen off, because that is how it will actually be used. Structured JSON is an accessibility choice too: the client validates a strict schema and surfaces **discrete, ordered points**, so coaching stays one fix at a time instead of a wall of feedback the user cannot skim.\n\nI designed for the hardest case — a blind maker, fully offline — and by the curb-cut effect, the same coaching helps any maker without a photographer or a reliable connection.\n\n*Alt: infographic of five steps — artisan capture, on-device analysis, voice feedback loop, compare and iterate, then listing copy for Etsy or Shopify.*\n\n## Demo\n\nFull walkthrough: first photo, spoken coaching, stronger retake, comparison, generated listing.\n\n### Try it live\n\n| Link | What you get |\n|---|---|\n|\nJudge / no-install demo. Sample photos play back real E4B runs recorded locally; uploads use Gemma 4 31B on Ollama Cloud so reviewers can try a photo without pulling a model. |\n|\nReal product path for the submission video — E4B on your Mac via Ollama (same Wi‑Fi PWA or tunnel). Photos do not go to Ollama Cloud on this deploy. |\n\nNo account. No tracking. Copy-ready output only — L.E.N.S. does not auto-publish to Etsy or Shopify.\n\n## Code\n\nSource, README, architecture notes, and spike write-ups:\n\n# 📷 L.E.N.S. — Local Edge Native Studio\n\nThe one step between a finished piece and a sale shouldn't depend on someone else's eyes.\n\n**A private, voice-guided photography coach for blind and low-vision artisans.**\n\n🔗 **Live demos:** [Judge try-it](https://lens-app-gemma4.vercel.app) (Ollama Cloud 31B) · [Real product / video](https://photography-coach-gemma4.vercel.app) (local E4B) · ** Demo video** · Built for the\n\n**Gemma 4 Good Hackathon**\n\n**Tracks:** Digital Equity & Inclusivity · Ollama\n\n## What L.E.N.S. is\n\nMohan has low vision. He hand-knits sweaters and can finish a flawless cable pattern by touch. He can shape, price, and list a piece on his own — until the one step he cannot finish alone: photographing it well enough to sell online.\n\n**L.E.N.S. closes that gap.** It is a voice-guided photography coach that helps blind and low-vision artisans *verify and improve their product photos before listing their work*. It runs Gemma 4 through Ollama, describes the photo in plain…\n\n**Stack:** React 19 + TypeScript PWA, optional Electron desktop build, **Ollama** for local multimodal inference, Web Speech API for coach voice output.\n\n**Repo highlights:**\n\n- Strict JSON contract — one schema drives description, colour check, single fix, alt-text, and listing copy.\n- Three\n**honestly labelled** inference modes (see below). - Spike docs:\n[Spike 1 — E4B via Ollama](https://github.com/prasadt1/photography-coach-gemma4/blob/main/spike/spike-1-results.md),[quantization study](https://github.com/prasadt1/photography-coach-gemma4/blob/main/docs/benchmarks/llama-cpp-quantization-study.md),[LiteRT iOS spike](https://github.com/prasadt1/photography-coach-gemma4/blob/main/docs/spikes/spike-3-litert-ios.md).\n\nThis is original work I built for accessibility-first product photography coaching; the repo is not a repackaged template.\n\n## How I Used Gemma 4\n\nGemma 4 is the core of L.E.N.S.: **multimodal photo assessment** and **coaching generation**. Every model and runtime choice followed from **local-first privacy** and **voice-loop latency**.\n\n### Why Gemma 4 E4B (and what I ruled out)\n\nThe Gemma 4 family spans small edge models, 31B Dense, and 26B MoE. For this project:\n\n| Variant | Role in my decision |\n|---|---|\nE2B (~2B) |\nToo small for consistent visual judgment on real product photos. |\nE4B (~4B) |\nShipped. Small enough for consumer hardware + Ollama offline; capable enough for trustworthy multimodal coaching. |\n31B Dense |\nRuled out for the product — too heavy for typical laptops; breaks the “photo never leaves the machine” promise. Used only for judge demo uploads on Ollama Cloud. |\n26B MoE |\nStrong for throughput/reasoning, but overkill for a single-photo voice loop on modest hardware; E4B matched the edge + multimodal product path better. |\n\n**E4B is the deliberate middle:** the trade-off *is* the project.\n\n### What E4B unlocked for this project\n\n-\n**Multimodal vision on-device**— real product photos in, structured coaching out (framing, light, focus, colour), not text-only guesses. -\n**Offline independence**— the product path never requires sending photos to a remote API. -\n**Usable voice-loop latency**— ~4B + Q4_K_M + streaming TTS ≈ ~20s warm (down from ~40s early on). -\n**Strict JSON coaching**— one spatial fix, two-photo compare, listing copy — all from schemas Ollama enforces at generation time. -\n**Honest dual deploy**— E4B for the real maker story; 31B only where judges need a zero-install upload path.\n\n### Multimodal + structured output (how it’s wired)\n\nEach analyze call sends the image in Ollama’s `messages[].images[]`\n\narray and asks Gemma 4 E4B for JSON via Ollama’s `format`\n\nfield (JSON Schema). The client validates before TTS speaks:\n\n``` js\n// services/ollamaService.ts — simplified\nconst messages = [\n  { role: 'system', content: buildSystemPrompt(/* artisan coaching */) },\n  { role: 'user', content: userPrompt, images: [base64ProductPhoto] },\n];\n\nawait fetch(`${OLLAMA_BASE}/api/chat`, {\n  method: 'POST',\n  body: JSON.stringify({\n    model: 'gemma4:e4b',\n    messages,\n    format: ARTISAN_V3_OUTPUT_SCHEMA,  // Ollama enforces JSON shape\n    stream: true,                       // TTS starts before generation ends\n    options: { num_predict: cappedTokens },\n    keep_alive: '30m',\n  }),\n});\n```\n\nThe artisan schema drives fields like scene description, **one** `priorityFix`\n\n, alt-text, and listing title/description — so VoiceOver/TalkBack and the coach voice never drown the maker in a paragraph of fixes.\n\n### Runtime: Ollama\n\nI spiked **Cactus** and **llama.cpp** as well. Ollama won for the cleanest local multimodal serving and the simplest path to multiple inference modes without rebuilding the pipeline each time.\n\n### Quantization: Q4_K_M\n\nOn modest hardware, **Q4_K_M** keeps E4B runnable without meaningfully hurting visual assessment. Lighter quants started to cost coaching quality; heavier ones were not worth the memory for this use case.\n\n### Latency and voice\n\nEarly warm inference was ~**40s** — too long for a spoken coaching loop. **Prompt tuning**, a **token cap**, a **warm-up call** on startup, and **streaming** brought warm runs to roughly **20s**.\n\n### Three honest inference modes\n\n| Mode | Model | Network |\n|---|---|---|\nLocal (product) |\nGemma 4 E4B via Ollama on the maker’s machine |\nFully offline |\nJudge demo uploads |\nGemma 4 31B on Ollama Cloud |\nRequires connection |\nDemo mode |\nPlayback of real recorded E4B responses |\nNone |\n\nI also spiked **LiteRT** for true on-device iOS inference (~25 tok/s in Google’s reference app). That is **Phase 2** — documented as roadmap, not claimed as shipped. Today, iOS is covered by the installable PWA talking to Ollama on the Mac (same Wi‑Fi or tunnel).\n\n### Why local Gemma matters\n\nPrivacy here is not a bullet point — it is the mechanism of **independence**. A cloud coach swaps one dependency for another: instead of a sighted helper, you need connectivity, an account, and a server that receives your product photos. A capable **Gemma 4** model on the maker’s own hardware is what makes “I can list this myself” real.\n\n## Accessibility (why the UX matches the model story)\n\n-\n**Voice-first** with an equivalent labelled control for every voice action. -\n**Screen reader:** landmarks, live regions, managed focus; coach TTS works*alongside*VoiceOver/TalkBack, not instead of it. -\n**One fix at a time**— same discipline in prompt design and UI. -\n**Anti-hallucination**— states uncertainty when the image does not support a claim. -\n**Multilingual** coaching paths in the prompt layer.\n\n## What’s next\n\n- Native on-device iOS via LiteRT (spike done; integration is post-hackathon).\n- More languages and tighter cold-start latency.\n- Deeper maker workflows (batch listing prep) — still local-first.\n\n## Links\n\n-\n**This challenge:**[Gemma 4 Challenge on DEV](https://dev.to/challenges/google-gemma-2026-05-06) -\n**Live demo:**[lens-app-gemma4.vercel.app](https://lens-app-gemma4.vercel.app) -\n**Product / video deploy:**[photography-coach-gemma4.vercel.app](https://photography-coach-gemma4.vercel.app) -\n**GitHub:**[photography-coach-gemma4](https://github.com/prasadt1/photography-coach-gemma4) -\n**Demo video:**[youtu.be/qoDLKzzcYHM](https://youtu.be/qoDLKzzcYHM)", "url": "https://wpnews.pro/news/l-e-n-s-a-private-photography-coach-for-blind-and-low-vision-artisans", "canonical_source": "https://dev.to/prasadt1/lens-a-private-photography-coach-for-blind-and-low-vision-artisans-4mj2", "published_at": "2026-05-22 22:19:03+00:00", "updated_at": "2026-05-22 22:31:05.782484+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "open-source", "products"], "entities": ["L.E.N.S.", "Gemma 4", "Ollama", "Mohan", "Etsy", "Apache 2.0", "prasadt1", "Local Edge Native Studio"], "alternates": {"html": "https://wpnews.pro/news/l-e-n-s-a-private-photography-coach-for-blind-and-low-vision-artisans", "markdown": "https://wpnews.pro/news/l-e-n-s-a-private-photography-coach-for-blind-and-low-vision-artisans.md", "text": "https://wpnews.pro/news/l-e-n-s-a-private-photography-coach-for-blind-and-low-vision-artisans.txt", "jsonld": "https://wpnews.pro/news/l-e-n-s-a-private-photography-coach-for-blind-and-low-vision-artisans.jsonld"}}