# L.E.N.S. — A private photography coach for blind and low-vision artisans

> Source: <https://dev.to/prasadt1/lens-a-private-photography-coach-for-blind-and-low-vision-artisans-4mj2>
> Published: 2026-05-22 22:19:03+00:00

**L.E.N.S.** (Local Edge Native Studio) is a voice-guided photography coach that runs **Gemma 4 E4B** locally through **Ollama** — so a maker can verify and improve product photos before listing, without sending images to the cloud and without asking someone sighted to “just check this one.”

Gemma 4’s **native multimodal vision** is the engine: each coaching turn sends a real product photo (base64 in the Ollama chat) and gets back structured JSON the app validates before speaking.

🔗 **Try it (no install):** [lens-app-gemma4.vercel.app](https://lens-app-gemma4.vercel.app)

📹 **Demo video:** [YouTube walkthrough](https://youtu.be/qoDLKzzcYHM)

💻 **Source:** [github.com/prasadt1/photography-coach-gemma4](https://github.com/prasadt1/photography-coach-gemma4) (Apache 2.0)

## What I Built

I built L.E.N.S. for someone like **Mohan** — a low-vision artisan who hand-knits sweaters to sell online. He can judge the knit by touch: tension, pattern, finish. What he cannot reliably judge is the **photograph** of the piece. Is it in focus? Is the light flat? Is the sweater cropped awkwardly or lost against the background? On a marketplace like Etsy, the photo *is* the product; a weak photo quietly costs the sale. Until now, that step has meant borrowing someone else’s eyes.

L.E.N.S. closes that gap.

- The maker points their camera and takes a photo.
-
**Gemma 4 E4B**— on their own machine, via Ollama — assesses framing, lighting, focus, and composition from the** image itself**(multimodal input, not a text-only description). - L.E.N.S. speaks back
**one** specific, actionable fix: not “this photo is bad,” but “move back about six inches” or “the light is behind the sweater — turn toward the window.” - They take a second photo; L.E.N.S.
**compares** the two images out loud and says which is stronger and why. - It drafts
**copy-ready listing text**— title, description, and alt-text — ready to paste into their store.

It is **voice-first by design**, not a visual UI with audio bolted on. I built and tested the flow with a screen reader on and the screen off, because that is how it will actually be used. Structured JSON is an accessibility choice too: the client validates a strict schema and surfaces **discrete, ordered points**, so coaching stays one fix at a time instead of a wall of feedback the user cannot skim.

I designed for the hardest case — a blind maker, fully offline — and by the curb-cut effect, the same coaching helps any maker without a photographer or a reliable connection.

*Alt: infographic of five steps — artisan capture, on-device analysis, voice feedback loop, compare and iterate, then listing copy for Etsy or Shopify.*

## Demo

Full walkthrough: first photo, spoken coaching, stronger retake, comparison, generated listing.

### Try it live

| Link | What you get |
|---|---|
|
Judge / no-install demo. Sample photos play back real E4B runs recorded locally; uploads use Gemma 4 31B on Ollama Cloud so reviewers can try a photo without pulling a model. |
|
Real product path for the submission video — E4B on your Mac via Ollama (same Wi‑Fi PWA or tunnel). Photos do not go to Ollama Cloud on this deploy. |

No account. No tracking. Copy-ready output only — L.E.N.S. does not auto-publish to Etsy or Shopify.

## Code

Source, README, architecture notes, and spike write-ups:

# 📷 L.E.N.S. — Local Edge Native Studio

The one step between a finished piece and a sale shouldn't depend on someone else's eyes.

**A private, voice-guided photography coach for blind and low-vision artisans.**

🔗 **Live demos:** [Judge try-it](https://lens-app-gemma4.vercel.app) (Ollama Cloud 31B) · [Real product / video](https://photography-coach-gemma4.vercel.app) (local E4B) · ** Demo video** · Built for the

**Gemma 4 Good Hackathon**

**Tracks:** Digital Equity & Inclusivity · Ollama

## What L.E.N.S. is

Mohan has low vision. He hand-knits sweaters and can finish a flawless cable pattern by touch. He can shape, price, and list a piece on his own — until the one step he cannot finish alone: photographing it well enough to sell online.

**L.E.N.S. closes that gap.** It is a voice-guided photography coach that helps blind and low-vision artisans *verify and improve their product photos before listing their work*. It runs Gemma 4 through Ollama, describes the photo in plain…

**Stack:** React 19 + TypeScript PWA, optional Electron desktop build, **Ollama** for local multimodal inference, Web Speech API for coach voice output.

**Repo highlights:**

- Strict JSON contract — one schema drives description, colour check, single fix, alt-text, and listing copy.
- Three
**honestly labelled** inference modes (see below). - Spike docs:
[Spike 1 — E4B via Ollama](https://github.com/prasadt1/photography-coach-gemma4/blob/main/spike/spike-1-results.md),[quantization study](https://github.com/prasadt1/photography-coach-gemma4/blob/main/docs/benchmarks/llama-cpp-quantization-study.md),[LiteRT iOS spike](https://github.com/prasadt1/photography-coach-gemma4/blob/main/docs/spikes/spike-3-litert-ios.md).

This is original work I built for accessibility-first product photography coaching; the repo is not a repackaged template.

## How I Used Gemma 4

Gemma 4 is the core of L.E.N.S.: **multimodal photo assessment** and **coaching generation**. Every model and runtime choice followed from **local-first privacy** and **voice-loop latency**.

### Why Gemma 4 E4B (and what I ruled out)

The Gemma 4 family spans small edge models, 31B Dense, and 26B MoE. For this project:

| Variant | Role in my decision |
|---|---|
E2B (~2B) |
Too small for consistent visual judgment on real product photos. |
E4B (~4B) |
Shipped. Small enough for consumer hardware + Ollama offline; capable enough for trustworthy multimodal coaching. |
31B Dense |
Ruled out for the product — too heavy for typical laptops; breaks the “photo never leaves the machine” promise. Used only for judge demo uploads on Ollama Cloud. |
26B MoE |
Strong for throughput/reasoning, but overkill for a single-photo voice loop on modest hardware; E4B matched the edge + multimodal product path better. |

**E4B is the deliberate middle:** the trade-off *is* the project.

### What E4B unlocked for this project

-
**Multimodal vision on-device**— real product photos in, structured coaching out (framing, light, focus, colour), not text-only guesses. -
**Offline independence**— the product path never requires sending photos to a remote API. -
**Usable voice-loop latency**— ~4B + Q4_K_M + streaming TTS ≈ ~20s warm (down from ~40s early on). -
**Strict JSON coaching**— one spatial fix, two-photo compare, listing copy — all from schemas Ollama enforces at generation time. -
**Honest dual deploy**— E4B for the real maker story; 31B only where judges need a zero-install upload path.

### Multimodal + structured output (how it’s wired)

Each analyze call sends the image in Ollama’s `messages[].images[]`

array and asks Gemma 4 E4B for JSON via Ollama’s `format`

field (JSON Schema). The client validates before TTS speaks:

``` js
// services/ollamaService.ts — simplified
const messages = [
  { role: 'system', content: buildSystemPrompt(/* artisan coaching */) },
  { role: 'user', content: userPrompt, images: [base64ProductPhoto] },
];

await fetch(`${OLLAMA_BASE}/api/chat`, {
  method: 'POST',
  body: JSON.stringify({
    model: 'gemma4:e4b',
    messages,
    format: ARTISAN_V3_OUTPUT_SCHEMA,  // Ollama enforces JSON shape
    stream: true,                       // TTS starts before generation ends
    options: { num_predict: cappedTokens },
    keep_alive: '30m',
  }),
});
```

The artisan schema drives fields like scene description, **one** `priorityFix`

, alt-text, and listing title/description — so VoiceOver/TalkBack and the coach voice never drown the maker in a paragraph of fixes.

### Runtime: Ollama

I spiked **Cactus** and **llama.cpp** as well. Ollama won for the cleanest local multimodal serving and the simplest path to multiple inference modes without rebuilding the pipeline each time.

### Quantization: Q4_K_M

On modest hardware, **Q4_K_M** keeps E4B runnable without meaningfully hurting visual assessment. Lighter quants started to cost coaching quality; heavier ones were not worth the memory for this use case.

### Latency and voice

Early warm inference was ~**40s** — too long for a spoken coaching loop. **Prompt tuning**, a **token cap**, a **warm-up call** on startup, and **streaming** brought warm runs to roughly **20s**.

### Three honest inference modes

| Mode | Model | Network |
|---|---|---|
Local (product) |
Gemma 4 E4B via Ollama on the maker’s machine |
Fully offline |
Judge demo uploads |
Gemma 4 31B on Ollama Cloud |
Requires connection |
Demo mode |
Playback of real recorded E4B responses |
None |

I also spiked **LiteRT** for true on-device iOS inference (~25 tok/s in Google’s reference app). That is **Phase 2** — documented as roadmap, not claimed as shipped. Today, iOS is covered by the installable PWA talking to Ollama on the Mac (same Wi‑Fi or tunnel).

### Why local Gemma matters

Privacy here is not a bullet point — it is the mechanism of **independence**. A cloud coach swaps one dependency for another: instead of a sighted helper, you need connectivity, an account, and a server that receives your product photos. A capable **Gemma 4** model on the maker’s own hardware is what makes “I can list this myself” real.

## Accessibility (why the UX matches the model story)

-
**Voice-first** with an equivalent labelled control for every voice action. -
**Screen reader:** landmarks, live regions, managed focus; coach TTS works*alongside*VoiceOver/TalkBack, not instead of it. -
**One fix at a time**— same discipline in prompt design and UI. -
**Anti-hallucination**— states uncertainty when the image does not support a claim. -
**Multilingual** coaching paths in the prompt layer.

## What’s next

- Native on-device iOS via LiteRT (spike done; integration is post-hackathon).
- More languages and tighter cold-start latency.
- Deeper maker workflows (batch listing prep) — still local-first.

## Links

-
**This challenge:**[Gemma 4 Challenge on DEV](https://dev.to/challenges/google-gemma-2026-05-06) -
**Live demo:**[lens-app-gemma4.vercel.app](https://lens-app-gemma4.vercel.app) -
**Product / video deploy:**[photography-coach-gemma4.vercel.app](https://photography-coach-gemma4.vercel.app) -
**GitHub:**[photography-coach-gemma4](https://github.com/prasadt1/photography-coach-gemma4) -
**Demo video:**[youtu.be/qoDLKzzcYHM](https://youtu.be/qoDLKzzcYHM)
