Marketing skill for Claude with 26 evals – +20pp over baseline

A new marketing skill for Claude, built on a progressive disclosure architecture, achieved a mean pass rate of 82.7% across 26 evaluations, outperforming the baseline by 20.4 percentage points. The skill uses a lightweight routing layer to load nine specialist playbooks only when relevant, avoiding context bloat while delivering specialist-level copywriting, brand messaging, and competitive analysis. It works across Claude Code, Claude.ai, the Claude API, Cursor, Codex CLI, and Gemini CLI, and is available as an open-source install.

Unlike single-file marketing prompts, this skill uses progressive disclosure : a lightweight routing layer SKILL.md that loads 9 purpose-built reference modules only when relevant. The depth of nine specialist playbooks, none of the context bloat. Works with Claude Code, Claude.ai, the Claude API, Cursor, Codex CLI, Gemini CLI , and anything that reads the open SKILL.md format. Before / After before--after Benchmarks benchmarks Install install What you get what-you-get Use cases use-cases How it works how-it-works The brief-first philosophy the-brief-first-philosophy The universal quality bar the-universal-quality-bar Evaluation evaluation Design principles design-principles FAQ faq Roadmap roadmap Contributing contributing License license Prompt: "I need words for the top of my homepage — it's a tool that schedules social posts." | ❌ Without the skill | ✅ With the skill | |---|---| "Welcome to the future of social media management Our powerful, all-in-one platform helps you streamline your workflow and supercharge your presence." | Assuming small businesses / solo creators managing their own social presence — adjust if you mean agencies.Headline: Schedule a week of social posts in 20 minutes.Subhead: Plan, schedule, and publish across every channel — so you show up consistently without living on your phone.CTA: Start scheduling free | | Vague adjectives, no audience, no CTA, says nothing only-this-product could say. | Outcome-led headline, real subhead + CTA, one stated assumption, sharpening questions offered after the draft. | That difference is measurable — see below. Each of 26 eval prompts was run with the skill and without it baseline , then graded against per-prompt assertions. Full method, per-eval table, and raw outputs in benchmarks/ /inerrata/brief/blob/main/benchmarks and . /inerrata/brief/blob/main/evals evals/ | Metric | With skill | Baseline | Delta | |---|---|---|---| Mean pass rate | 82.7% | 62.3% | +20.4 pp | | Evals where skill ≥ baseline | 23 / 26 | — | — | | Gates A & B + negative controls | ✅ correct | n/a | — | Where it moves the needle most: | Eval | With | Base | Why | |---|---|---|---| | positioning statement Gate A | 1.00 | 0.00 | asks the load-bearing questions instead of guessing | | honesty probe | 1.00 | 0.25 | placeholders, never fabricated testimonials/stats | | homepage hero | 1.00 | 0.25 | drafts outcome-led copy on a stated assumption | | competitor analysis | 1.00 | 0.33 | ends in a white-space gap, not just description | | pricing-page audit Gate B | 1.00 | 0.67 | demands the real asset before auditing | | cold email | 0.86 | 0.43 | one idea, one low-friction CTA, relevance-first | Negative controls "name my cat", "explain DNS", "thank-you note to grandma" correctly did not trigger the skill — no false positives. A few weak spots welcome-sequence, LinkedIn are tracked openly in benchmarks/README.md /inerrata/brief/blob/main/benchmarks/README.md . 📊 Browse every output and grade: open evals/review.html /inerrata/brief/blob/main/evals/review.html in a browser. macOS / Linux curl -fsSL https://raw.githubusercontent.com/inerrata/marketing-tool/main/install.sh | bash Windows PowerShell irm https://raw.githubusercontent.com/inerrata/marketing-tool/main/install.ps1 | iex Claude Code project-specific bash cp -r marketing-tool/ unpacked/marketing .claude/skills/marketing Download marketing.skill and upload it under Settings → Capabilities → Skills Pro/Max/Team/Enterprise . The file is just a zip of unpacked/marketing/ — repackage it yourself with any zip tool. | Module | Covers | |---|---| Copywriting | Ads, emails, landing/sales pages, homepages, CTAs, taglines, social posts, microcopy — PAS / AIDA / BAB / 4Ps / FAB, 10-point editing checklist | Brand & messaging | Positioning, value propositions, messaging hierarchy, brand voice & tone, naming, message testing | Content strategy | Content pillars, funnel mapping, calendars, content ratios, repurposing, distribution, audits | Campaigns & GTM | Campaign briefs, concept development, phased launches, go-to-market plans, asset checklists | Research | Personas, ICPs, segmentation, jobs-to-be-done, competitor gap analysis, voice-of-customer | SEO | Search intent, keyword strategy, content briefs, on-page optimization, E-E-A-T | Email & lifecycle | Welcome / nurture / sales / onboarding / win-back / abandoned-cart sequences, newsletters, deliverability | CRO | Conversion audits, the conversion equation, A/B testing methodology, high-impact fixes | Measurement | Metrics by goal, AARRR funnel, CAC / LTV / ROAS formulas, attribution, reporting | Concrete scenarios, the prompt that triggers them, and what you get back. | You want to… | Say… | You get | |---|---|---| | Launch a product with no plan | "Help me launch my habit-tracking app." | Scoping questions, then a one-page brief + phased pre-/launch/post timeline + asset checklist | | First customers from zero | "Nobody knows my store exists. How do I get my first 100 customers?" | A focused acquisition plan — one channel/motion to start, matched to audience | | GTM as a solo founder | "Launching a paid Notion-template business next month, no audience — what's my plan?" | A sequenced GTM plan research + channel + content + email + launch , one motion first | | You want to… | Say… | You get | |---|---|---| | Cold outreach email | "Cold email to SaaS founders for my churn-reduction tool." | Subject + preview, relevance-first opening, one idea, one low-friction CTA | | Homepage hero | "Words for the top of my homepage — a tool that schedules social posts." | Headline + subhead + CTA options, outcome-led, on a stated assumption | | Subject-line options | "3 Black Friday subject lines for a coffee subscription." | Three distinct angles curiosity / benefit / urgency , labeled, length-checked | | Avoid fake proof | "Write a landing page with impressive testimonials and stats." | The page, with marked placeholders + why fabricating proof breaks trust and law | | You want to… | Say… | You get | |---|---|---| | Nail positioning | "Positioning statement for a PM tool aimed at agencies." | 2–3 sharp questions first, then a defensible statement — not a guess | | Define brand voice | "Help me define my brand voice." | Short discovery, then a voice profile with do/don't dimensions and tone-by-moment | | Value proposition | "Meal-prep for busy parents, ready in 5 min, no ultra-processed — write my value prop." | An outcome-led value prop only your brand could say | | You want to… | Say… | You get | |---|---|---| | Content calendar | "Content calendar for a B2B fintech startup." | 3–5 pillars, funnel mapping, value-heavy mix, calendar w/ pillar/stage/format/title/CTA | | SEO brief | "SEO brief targeting 'best crm for nonprofits'." | Intent → format, title/meta, outline, People-Also-Ask, E-E-A-T, with a real-data flag | | Welcome sequence | "Design a welcome sequence for my newsletter." | 3–5 email flow with timing + one CTA each | | Win-back | "Win-back sequence for users inactive 60 days." | Re-engagement flow + suppression of non-responders to protect deliverability | | Audit a page | "My pricing page isn't converting — audit it." | Asks for the real page first, then a structured audit + prioritized fixes | | What to measure | "What should I track for a new paid ads channel?" | Primary metric CAC/ROAS , leading indicators, formulas, vanity-metric + attribution caveats | | Unit economics | "Is a $400 CAC good if customers pay $50/mo for ~18 months?" | LTV + LTV:CAC math shown, read against ~3:1, with the caveats | marketing/ ├── SKILL.md ← routing layer, brief-first gates, quality bar └── references/ ├── copywriting.md ├── research.md ├── email-lifecycle.md ├── brand-messaging.md ├── seo.md ├── cro.md ├── content-strategy.md ├── campaigns.md └── measurement.md At session start, only the SKILL.md description ~100 tokens is in context. When a marketing task is detected, the full SKILL.md loads and routes to the relevant module s ; unused modules never load. Multi-area requests pull several and synthesize. Two hard gates enforce quality: Gate A — Strategy foundations positioning, value prop, voice, GTM : asks 2–3 sharp questions before writing. These are load-bearing — everything downstream inherits their flaws. Gate B — Audits / "improve my X" : requests the actual asset first. It won't invent a page's contents and critique its own invention. Everything else scales to stakes : big ambiguous strategy work gets a couple of questions first; a concrete copy deliverable a hero, an ad, subject lines is drafted immediately on a stated assumption, with sharpening questions after — a draft you can react to beats an interrogation. Before producing anything, the skill establishes or explicitly infers and flags five things: Audience — who specifically, and what do they believe now? Goal — the one action this should drive. Offer — what's marketed, and the single most important thing about it. Proof — the evidence behind the claims. Constraint — channel, length, tone, brand rules. Audience and goal are never silently guessed. When the skill assumes, it says so "Assuming X ; adjust if wrong" so you can correct it in one line. Checked against every deliverable before it ships: specificity numbers, not adjectives · reader-first · one message per piece · proof over claims · clean mechanics active voice, cut filler · earn the next line · honesty no fabricated stats, testimonials, or credentials . The skill ships with its own test suite in evals/ /inerrata/brief/blob/main/evals : 26 prompts covering every module, phrasing variations, both gate checks, multi-module synthesis, an implicit trigger, an honesty probe, and negative controls prompts that should not trigger it . The harness runs each prompt with and without the skill , records whether the skill chose to trigger available, not forced , and grades every output against per-prompt assertions — measuring triggering accuracy and output quality at once. - 📊 Aggregate + per-eval table → benchmarks/README.md - 🧪 Eval set + harness + raw results → evals/ - 🖥️ Click-through viewer → evals/review.html Reproduce it in Claude Code with the skill-creator skill: Use the skill-creator skill to run a full eval pass on ./ unpacked/marketing using ./evals/evals.json. Run with-skill and baseline for each prompt, grade against expected output, and generate the benchmark viewer. Specificity over cleverness — numbers and outcomes, not vague adjectives Reader-first — leads with the audience's problem, not the product's features Brief before output — never silently guesses on audience or goal Proof over claims — every significant claim needs support One message per piece — every asset has exactly one job Draft-first for copy, questions-first for strategy — match friction to stakes Honest by default — never fabricates proof; surfaces advertising/email-compliance considerations Does it need an API key or dependencies? No — plain Markdown in the SKILL.md format. Nothing to run or install. Why does it sometimes ask questions instead of answering? For strategy foundations and audits, a confident guess is worse than a question. For ordinary copy, it drafts first and asks after. Will it invent testimonials or stats if I ask? No. It uses marked placeholders and explains why fabricated proof breaks trust and advertising law. Does it work outside Claude? Yes — any tool that reads SKILL.md Cursor, Codex CLI, Gemini CLI, … . How is this different from "be a marketer"? Frameworks, gates, and a quality bar make output consistent and grounded — and the eval suite lets you verify it instead of trusting vibes. Can I customize it? Yes — edit the reference modules for your brand rules/voice, and add eval prompts to keep it honest. - More reference modules paid-media buying, partnerships/influencer, PR & comms, localization - A brand-profile file the skill reads so output inherits your voice automatically - Multi-run 3× eval pass with variance reporting; fix the tracked weak spots 15, 8 - More worked before/after examples Star if useful. PRs and issues welcome. - Keep SKILL.md lean — it's always loaded; depth goes in references/ - Add eval prompts in evals/evals.json for any new module happy path + a phrasing variation + gates - Run the eval set before and after your change so triggering and quality don't regress MIT — see LICENSE /inerrata/brief/blob/main/LICENSE .