{"slug": "single-page-claude-writes-beautifully-at-5-pages-it-drifts-here-s-the-harness-i", "title": "Single-page Claude writes beautifully. At 5 pages it drifts. Here's the harness I built.", "summary": "A developer built a harness with 14 gates, auto-retry, and handoff JSON to stop multi-page drift when generating React apps from Figma files using LLMs like Claude and Codex. The harness achieved a 100% build-green rate across 10 demos and 54 screens in four business domains. The developer identified four specific drift modes in multi-page output: world-state re-invention, route hallucination, store-key decay, and silent build failures.", "body_md": "I gave Claude / Codex a Figma file + a PRD and asked for 5-10 React pages of a working app. **Single-page output is great. Multi-page output drifts in 4 specific ways.** I spent ~3 months building a harness with 14 gates × auto-retry × handoff JSON to stop the drift. 10 demos, 54 screens, 4 unrelated business domains, build-green rate 100%.\n\nCode: [https://github.com/JiuwenDragon/harness-mini](https://github.com/JiuwenDragon/harness-mini)\n\nEvery \"Figma to code with AI\" demo on Twitter shows one screen. That's a real result — Claude vision is genuinely good at single-page UI. I verified this many times during my research: **giving Claude a screenshot + a paragraph of PRD produces a 70-80 point page in 30 seconds**.\n\nThe promise breaks at 5+ screens. Here are the 4 drift modes I measured.\n\n| Screen 1 | Screen 2 | Screen 3 |\n|---|---|---|\n| Username: Zhang San | Username: Li Si | Username: Test User |\n\nLLM doesn't carry a \"world state\" across page generations. Without explicit injection, it re-invents.\n\n``` js\n// Screen \"transfer\" generated:\n<button onClick={() => router.push(\"/banking/home\")}>  // ← /banking\n// Screen \"home\" actually at:\napp/bank/home/page.tsx                                  // ← /bank\n```\n\nSingle-page review never catches this. Click-through breaks.\n\nA zustand store with 5 keys (user, balance, lastTx, recent[], selected). LLM forgets 2-3 keys on screen 4, makes new ones up. Same business concept, three different variable names.\n\n```\n> Codex: All 10 pages generated, ready to preview.\n> me: npm run build\n> 3 pages: red. 2 pages: empty <div /> stubs. 1 page: import path wrong.\n```\n\nThis one is the most painful. Without an external check, \"claimed done\" ≠ done.\n\n```\nFigma + PRD\n    ↓ intake (fixture split)\n    ↓ contract (frozen spec)\n    ↓ generate (codex / claude / gemini)\n    ↓ 14 gates (semantic / PRD / spec / UI hygiene / build / cross-canvas)\n    ↓ visual review (human)\n    ↓ web-preview (clickable)\n```\n\nEach gate is **scoped to one constraint**. Why? See Constraint Decay paper (arXiv 2605.06445): stuffing 10+ constraints into one prompt drops LLM performance by 30 percentage points.\n\nThe retry loop: when a gate fails, the gate's structured error report (not a vague \"try again\") is fed back to the LLM. Reflexion-style.\n\nThe handoff: each stage emits `*_status.json`\n\nso a new operator (or a new LLM session) can pick up without reading the conversation.\n\nConstraint Decay (arXiv 2605.06445) measured the drop directly.\n\nLost in the Middle (arXiv 2307.03172) shows the LLM ignores constraints buried in long prompts.\n\nSo I push **one check per gate**, max ~3 constraints per LLM round.\n\n| Domain | Color | Screens | Build pass |\n|---|---|---|---|\n| Banking | Deep red | 10 | 10/10 |\n| Fitness | Orange | 3 | 3/3 |\n| Travel | Blue | 3 | 3/3 |\n| Shoes | Black | 3 | 3/3 |\n\nSame 14 gates. Same Codex/Claude/Gemini providers swapped via contract. No per-domain prompt tuning.\n\n| Tool | Strength | Why it's not what I needed |\n|---|---|---|\n| Builder.io Visual Copilot | 2M+ training data, Mitosis IR | SaaS, no PRD dim, no audit trail |\n| Locofy LDM | Large Design Model | SaaS, design system requires strict Auto Layout |\n| Figma Make | Highest fidelity (EPAM benchmark) |\nNo public API, browser-only, $16/mo seat |\n| v0 (Vercel) | Tight shadcn/Next.js | Figma link silently downgrades to screenshot (loses metadata) |\n\nThese are all great for \"single dev makes a pretty page.\" None give me **multi-page consistency + PRD enforcement + audit log + on-prem + provider swap**, which is the actual enterprise need.\n\n[https://github.com/JiuwenDragon/harness-mini](https://github.com/JiuwenDragon/harness-mini)\n\n`scripts/`\n\nMIT license (I should add the file — open to PR).\n\nHappy to answer questions in comments. The most useful feedback would be: \"what other drift modes have you seen at >5 pages.\"", "url": "https://wpnews.pro/news/single-page-claude-writes-beautifully-at-5-pages-it-drifts-here-s-the-harness-i", "canonical_source": "https://dev.to/xiaolangtizi/single-page-claude-writes-beautifully-at-5-pages-it-drifts-heres-the-harness-i-built-4cdn", "published_at": "2026-06-18 05:17:55+00:00", "updated_at": "2026-06-18 05:51:48.479827+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools", "ai-agents", "generative-ai"], "entities": ["Claude", "Codex", "Gemini", "Builder.io", "Locofy", "Figma", "Vercel", "arXiv"], "alternates": {"html": "https://wpnews.pro/news/single-page-claude-writes-beautifully-at-5-pages-it-drifts-here-s-the-harness-i", "markdown": "https://wpnews.pro/news/single-page-claude-writes-beautifully-at-5-pages-it-drifts-here-s-the-harness-i.md", "text": "https://wpnews.pro/news/single-page-claude-writes-beautifully-at-5-pages-it-drifts-here-s-the-harness-i.txt", "jsonld": "https://wpnews.pro/news/single-page-claude-writes-beautifully-at-5-pages-it-drifts-here-s-the-harness-i.jsonld"}}