{"slug": "teaching-agents-product-design-at-vercel", "title": "Teaching agents product design at Vercel", "summary": "Vercel has introduced a system called 'product-design' to teach coding agents the reasoning behind product decisions, storing accepted product decisions in the repository and making them available to agents. The system includes an agent skill, linters for automatic enforcement, and a review loop that gathers evidence from Slack, Figma, and GitHub. This approach aims to give agents context beyond code, enabling them to produce UI that aligns with the team's standards.", "body_md": "Coding agents can produce working UI fast, but what's harder is a different shape. They can copy your product's style, match its patterns, and try to follow its conventions. What they cannot do is understand why those patterns exist. Code shows agents what shipped, not why one component, phrase, or interaction became your standard. That reasoning lives in design reviews, PR comments, Slack threads, and with the people who were in the room. For an agent, context that isn't in the codebase doesn't exist.\n\nVercel is an agent-native team. We treat accepted product decisions like code, keeping them in the repository, reviewing changes against them, and making them available to every agent working there.\n\nThe way we do this is through `product-design`\n\n. It's a system with three parts:\n\nAn agent skill that gives coding agents the context behind decisions that require product or codebase judgment.\n\nLinters that enforce clear rules automatically.\n\nA review loop that gathers evidence from Slack, Figma, and GitHub, then prepares guideline updates for review.\n\nAny team can build the same structure around their own standards.\n\nThe skill lives inside the repository alongside the code it governs. Here's a simplified view of its structure:\n\nThe repository `AGENTS.md`\n\ntells coding agents when to load the skill. The skill-local `AGENTS.md`\n\ndefines load order, validation, and governance. `SKILL.md`\n\nowns the runtime workflow.\n\n`references/`\n\nstores product-judgment, interface-quality, resilience, copy, canonical product names, interaction patterns, and surface-specific decisions.\n\n`exemplars/`\n\ndocuments decisions worth repeating from shipped pull requests, along with mistakes to avoid. `coverage-gaps.md`\n\nlists areas where we do not have a standard yet.\n\n`copywriting-eval/`\n\ntests copy and interface-language behavior. It does not evaluate the broader product-design workflow.\n\n`SKILL.md`\n\nresolves the request mode first: shape, implement, review, copy, or harden. This keeps audits from becoming edits and copy passes from expanding into redesigns. It skips backend-only work, telemetry, console errors, generated files, and tests with no shipped UI impact.\n\nThe skill routes to canonical sources instead of duplicating them. Component APIs, design-system rules, accessibility criteria, and interaction guidance stay with their owners.\n\nRouting is specific to both task and surface. Material changes load product-judgment and interface-quality first. Copy, component, layout, interaction, accessibility, and resilience work each route to focused references. A modal loads destructive-action patterns and canonical verbs. A settings form loads labels, validation, progressive disclosure, and accessible-name guidance.\n\nYou can use this simplified structure as a starting point and replace the paths and standards with your own:\n\nRouting is only part of what makes the skill useful. The other part is how findings stay traceable once the skill produces them.\n\nCopy rules have stable IDs and point to their canonical sources:\n\nWhen Vercel Agent proposes a patch, it validates the change in a secure Vercel Sandbox with the repository's builds, tests, and linters before posting the suggestion.\n\nWe prefer deterministic checks when a linter can enforce a rule reliably. Linters are fast and cheap to run, so developers and coding agents get feedback while they work instead of waiting for a later review.\n\nCode can count two or three static options, so a linter can recommend radio buttons. Naming the right object and consequence for a destructive action requires product context, so the skill handles it.\n\nExamples in the codebase include rules that:\n\nPrevent nested modals, which break focus management, keyboard navigation, and layering.\n\nRecommend radio buttons instead of a select for two or three static options, so every choice stays visible.\n\nRequire accessible names for icon buttons and form controls, and reject custom focus rings that bypass shared focus tokens.\n\nPrevent `className`\n\nfrom overriding a design-system component's color, radius, or shadow while still allowing layout classes.\n\nRequire `Modal.Body`\n\nso long content scrolls correctly and headers and footers can remain sticky.\n\nReplace raw shadows with theme-aware Material classes and reject borders that duplicate a Material's built-in treatment.\n\nFlag arbitrary spacing that falls off the 4px grid and suggest a standard utility when one exists.\n\nEach rule explains why the pattern is a problem and suggests a concrete fix. Some rules autofix safe migrations, such as replacing deprecated Tailwind utility names.\n\nAccepted decisions can take several forms:\n\nHuman-readable guidance next to the relevant Geist component, such as [Checkbox best practices](https://vercel.com/geist/checkbox).\n\nAgent guidance in the `product-design`\n\nskill.\n\nA lint rule when code can check it reliably.\n\nThe lint rule below shows how one product guideline is encoded as a deterministic check:\n\nEach of these catches a class of mistake automatically, freeing code review for the decisions that actually require judgment.\n\nLint rules are deterministic, but agent behavior can vary, so we test the skill on interfaces it has not seen before.\n\nAn agent edits a before state, then a judge checks the results against a rubric.\n\nEvals come from shipped examples documented in the skill. Holdouts hide their expected edits, testing whether the guidance generalizes. We also run fixtures without the skill to measure whether it changed the agent's behavior.\n\nWe score rule correctness separately from similarity to the shipped result. Shipped code can contain a flaw that the agent should improve instead of reproduce.\n\nProduct standards change as components, names, workflows, and failure states change, and every update needs evidence and human review.\n\nOur weekly evidence-intake workflow collects design feedback that may improve `product-design`\n\n. It searches Slack conversations and preserves links to Figma files, pull requests, review comments, and previews as evidence. When evidence is incomplete, it records the code or commit needed for verification.\n\nThe workflow separates collection from judgment:\n\nA collector gathers messages, links, and nearby context without proposing rules.\n\nA separate judge groups the evidence, verifies sources, and records open questions.\n\nThe job creates a review packet with candidates, rejected topics, follow-up requests, and coverage gaps.\n\nEvery candidate links to its source and remains pending. A comment from an experienced reviewer can raise its priority, but every candidate still needs evidence.\n\nAutomation ends with the review packet. A human decides whether a candidate becomes agent guidance, a lint rule, an example, an eval, or no change. Accepted changes go into the narrowest relevant file and pass the relevant checks before merging.\n\nOur setup reflects Vercel's product, components, and review history, but other teams can adapt the structure to their own standards.\n\nChoose one product surface where the same review comments keep appearing: destructive actions, error states, settings forms, empty states, or navigation. Collect examples from shipped code and real reviews, and write down the decision, why it matters, exceptions, and the source.\n\nAvoid starting with broad adjectives like `clear`\n\n, `polished`\n\n, or `intuitive`\n\n. Agents need observable decisions. `Destructive actions use Verb + Noun`\n\nis usable. `Buttons should be clear`\n\nis not.\n\nFill in the fields specific to your surface before expanding to others.\n\nTell agents when to load the skill in persistent repository instructions, and define the files and surfaces it covers along with the areas it must skip. In [separate Next.js evals](https://vercel.com/blog/agents-md-outperforms-skills-in-our-agent-evals), agents failed to invoke an available skill in 56% of cases. Test the trigger separately from the guidance, because failing to load the skill and failing to follow a rule are different problems.\n\nAsk the agent to report which surfaces and references it loaded, then verify that its findings cite those sources.\n\nUse a short entry point to identify the surface and load focused references. Organize the details around surfaces and decisions reviewers already discuss: forms, modals, navigation, product vocabulary, workflow states, and cross-surface patterns.\n\nGive rules stable IDs and link them to examples and sources. Record shipped examples with both useful decisions and known flaws, and keep missing guidance visible in a coverage-gap list.\n\nA coverage-gap list makes missing guidance explicit.\n\nIf a linter can identify a problem reliably, enforce the rule there. Use agent guidance when the decision needs product or codebase context. Keep new standards, policy choices, and unresolved product decisions with people.\n\nBuild training fixtures from documented examples and holdouts from interfaces whose expected edits do not appear in the skill. Test retrieval and application separately, because whether the agent loaded the skill and whether it followed the rule are different questions.\n\nIf a rule cannot stay reliable without many exceptions, move it back to agent guidance.\n\nReview new evidence regularly, but require human approval before changing the guidance or checks. Keep a decision log that records what changed, why, and which source supported it. Treat new rules as product changes, reviewing and testing each one, and removing those that stop helping.\n\nStart with one surface and the decisions your team already repeats. Put those decisions where code is written and reviewed, and keep people responsible for what becomes a standard.\n\nThe hardest part is picking the first surface. Every team has decisions worth encoding. The question is whether they live in someone's head or somewhere agents can find them. If you build something using this pattern or have questions about how we set it up, let us know.", "url": "https://wpnews.pro/news/teaching-agents-product-design-at-vercel", "canonical_source": "https://vercel.com/blog/teaching-agents-product-design-at-vercel", "published_at": "2026-06-24 07:00:00+00:00", "updated_at": "2026-06-25 20:13:39.170917+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-tools"], "entities": ["Vercel", "GitHub", "Figma", "Slack"], "alternates": {"html": "https://wpnews.pro/news/teaching-agents-product-design-at-vercel", "markdown": "https://wpnews.pro/news/teaching-agents-product-design-at-vercel.md", "text": "https://wpnews.pro/news/teaching-agents-product-design-at-vercel.txt", "jsonld": "https://wpnews.pro/news/teaching-agents-product-design-at-vercel.jsonld"}}