# UXRay: I Built an AI That Roasts Your UI Like a Senior Designer Would

> Source: <https://dev.to/pulkitgovrani/uxray-i-built-an-ai-that-roasts-your-ui-like-a-senior-designer-would-2gfl>
> Published: 2026-05-23 11:19:45+00:00

*This is a submission for the Gemma 4 Challenge: Build with Gemma 4*

## What I Built

**UXRay** — drop a screenshot or paste a URL, get a full UX audit in seconds.

Most designers and developers ship UIs without a systematic critique. Hiring a UX consultant is expensive. Running a full user study takes weeks. UXRay closes that gap: it gives you the same structured, heuristic-based analysis a senior UX professional would produce — instantly, locally, and for free.

You give UXRay a UI (file upload or live URL) and it returns:

-
**Overall UX score**(0–100) -
**Cognitive load analysis**— is the interface overwhelming users? -
**Trust score**— what signals build or erode credibility? -
**Friction points**— specific elements causing drop-off, each mapped to a Nielsen heuristic and rated critical / warning / info -
**Prioritized recommendations**— actionable fixes sorted by urgency with effort and impact ratings -
**Accessibility flags**— WCAG 2.1 violations visible in the screenshot -
**Layout analysis**— fold content, visual hierarchy strength, whitespace quality, and scan pattern (Z vs F)

The analysis is grounded in established UX theory: Nielsen's 10 Usability Heuristics, Gestalt principles, Fogg's trust heuristics, Sweller's cognitive load theory, and WCAG 2.1. Every friction point cites the exact heuristic it violates so you know *why* something is a problem, not just *that* it is.

**Stack:** Next.js 16 (App Router, TypeScript) · Tailwind v4 · Framer Motion · Gemma 4 E4B via Ollama · Playwright microservice for URL screenshots · Zod for structured output validation

## Demo

Live test: I pointed UXRay at dev.to. It captured a full-page screenshot, ran the Gemma 4 analysis, and returned a structured result — 85 overall score, 3 friction points, 3 prioritized recommendations — in about 56 seconds on CPU, no GPU required.

## Code

# UXRay — AI-Powered UX Analysis

X-ray your interface through AI. Powered by Gemma 4 E4B.

UXRay analyzes any UI screenshot like a behavioral psychologist — detecting cognitive load, trust signals, friction points, and actionable redesign recommendations. It uses **Gemma 4's native multimodal vision** to *see* the interface directly, not just process text descriptions.

Built for the **Google Gemma 2026 Hackathon** on dev.to.

## Demo

Upload a screenshot or paste a URL → Gemma 4 analyzes it → structured UX critique appears:

-
**Overall UX Score**(0–100) -
**Cognitive Load** gauge with specific issues -
**Trust Score** with positive/negative signals -
**Friction Points** with heuristic references (Nielsen, Gestalt, WCAG) -
**Recommendations** sorted by priority with effort/impact ratings -
**Accessibility Flags** and Layout Analysis

## Prerequisites

-
**Ollama** installed and running:

```
brew install ollama
brew services start ollama
```

-
**Gemma 4 E4B** pulled:

```
ollama pull gemma4:e4b
```

-
**Node.js 18+**

## Setup

```
# Clone the repo
git clone <repo-url>
cd uxray
# Install
```

…

The two key pieces of the pipeline:

**1. Gemma 4 client ( web/lib/gemma.ts)**

Sends the screenshot as a raw base64 image to Ollama's `/api/generate`

endpoint with `format: "json"`

enforced, streams the NDJSON response token-by-token, and validates the output against a strict Zod schema. If JSON parsing fails on the first pass, it automatically retries at a lower temperature (0.1) to coax a clean response.

``` js
const response = await fetch(`${OLLAMA_BASE_URL}/api/generate`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    model: "gemma4:e4b",
    prompt: SYSTEM_PROMPT + "\n\n" + USER_PROMPT,
    images: [base64Image],   // raw base64, no data URI prefix
    format: "json",          // enforces valid JSON output
    stream: true,
    options: {
      temperature: 0.3,
      num_ctx: 8192,
    },
  }),
});
```

**2. Playwright screenshot service ( playwright-service/server.js)**

A small Express server that accepts a URL, spins up Chromium, captures a full-page screenshot, and returns it as base64. This lets UXRay analyze any live site without leaving the local pipeline.

To run it yourself:

```
# Pull the model first
ollama pull gemma4:e4b

# Start both services (Next.js on :3000, Playwright on :3001)
npm install && npm run dev
```

## How I Used Gemma 4

I chose **Gemma 4 E4B** (the 4-billion-parameter multimodal variant) for three reasons:

### 1. Multimodal vision is load-bearing, not decorative

UXRay's entire value proposition requires *seeing* the UI. The model has to identify specific elements — button labels, color contrast, spacing, typography — and reason about them in relation to UX principles. Gemma 4's vision capability handles this natively. There's no separate OCR step, no layout parsing pipeline, no element segmentation — the model just looks at the screenshot and reasons.

### 2. E4B runs on CPU in a reasonable time

The 4B parameter count was a deliberate choice. I wanted UXRay to work on a developer's laptop without requiring a GPU. At ~56 seconds for a full audit on CPU, E4B hits the sweet spot: thorough enough to produce genuinely useful output, fast enough to feel interactive. The 31B Dense model would have been overkill for a local-first tool, and E2B felt too thin for the reasoning depth the structured output requires.

### 3. JSON mode + structured output validation

Setting `format: "json"`

in the Ollama request pushes Gemma 4 to emit valid JSON directly, which I then validate with a Zod schema. The system prompt defines the exact schema — `frictionPoints`

, `cognitiveLoad`

, `trustScore`

, `layoutAnalysis`

— and the model follows it reliably. This makes the output directly renderable in the UI with zero post-processing.

The system prompt grounds every analysis in specific UX frameworks so the model doesn't just describe what it sees — it diagnoses *why* it's a problem and cites the principle being violated:

```
You are UXRay, an expert UX analyst with deep knowledge of:
- Nielsen's 10 Usability Heuristics
- Gestalt principles of visual design
- WCAG 2.1 accessibility guidelines
- Cognitive load theory (Sweller)
- Trust and credibility heuristics (Fogg's Persuasive Technology)
- Conversion rate optimization (CRO)
```

A real friction point from the dev.to analysis looks like this:

```
{
  "id": "fp-1",
  "location": "Primary CTA button",
  "description": "Button label 'Get started' is generic — users cannot predict what commitment they're making, increasing hesitation at the conversion moment.",
  "severity": "warning",
  "heuristic": "Nielsen #6 — Recognition over recall"
}
```

Gemma 4's ability to follow a complex, multi-section JSON schema while simultaneously reasoning about visual design principles across a real screenshot is what makes this whole approach viable. Swap it for a text-only model and UXRay doesn't exist.

*Built with Gemma 4 E4B + Ollama + Next.js 16. Runs fully local — your screenshots never leave your machine.*
