{"slug": "marketing-skill-for-claude-with-26-evals-20pp-over-baseline", "title": "Marketing skill for Claude with 26 evals – +20pp over baseline", "summary": "A new marketing skill for Claude, built on a progressive disclosure architecture, achieved a mean pass rate of 82.7% across 26 evaluations, outperforming the baseline by 20.4 percentage points. The skill uses a lightweight routing layer to load nine specialist playbooks only when relevant, avoiding context bloat while delivering specialist-level copywriting, brand messaging, and competitive analysis. It works across Claude Code, Claude.ai, the Claude API, Cursor, Codex CLI, and Gemini CLI, and is available as an open-source install.", "body_md": "Unlike single-file marketing prompts, this skill uses **progressive disclosure**: a lightweight routing layer (`SKILL.md`\n\n) that loads 9 purpose-built reference modules only when relevant. The depth of nine specialist playbooks, none of the context bloat. Works with **Claude Code, Claude.ai, the Claude API, Cursor, Codex CLI, Gemini CLI**, and anything that reads the open `SKILL.md`\n\nformat.\n\n[Before / After](#before--after)[Benchmarks](#benchmarks)[Install](#install)[What you get](#what-you-get)[Use cases](#use-cases)[How it works](#how-it-works)[The brief-first philosophy](#the-brief-first-philosophy)[The universal quality bar](#the-universal-quality-bar)[Evaluation](#evaluation)[Design principles](#design-principles)[FAQ](#faq)[Roadmap](#roadmap)[Contributing](#contributing)[License](#license)\n\n**Prompt:** *\"I need words for the top of my homepage — it's a tool that schedules social posts.\"*\n\n| ❌ Without the skill | ✅ With the skill |\n|---|---|\n\"Welcome to the future of social media management! Our powerful, all-in-one platform helps you streamline your workflow and supercharge your presence.\" |\nAssuming small businesses / solo creators managing their own social presence — adjust if you mean agencies.Headline: Schedule a week of social posts in 20 minutes.Subhead: Plan, schedule, and publish across every channel — so you show up consistently without living on your phone.CTA: Start scheduling free |\n| Vague adjectives, no audience, no CTA, says nothing only-this-product could say. | Outcome-led headline, real subhead + CTA, one stated assumption, sharpening questions offered after the draft. |\n\nThat difference is measurable — see below.\n\nEach of 26 eval prompts was run **with the skill and without it (baseline)**, then graded against per-prompt assertions. Full method, per-eval table, and raw outputs in [ benchmarks/](/inerrata/brief/blob/main/benchmarks) and\n\n[.](/inerrata/brief/blob/main/evals)\n\n`evals/`\n\n| Metric | With skill | Baseline | Delta |\n|---|---|---|---|\nMean pass rate |\n82.7% |\n62.3% | +20.4 pp |\n| Evals where skill ≥ baseline | 23 / 26 | — | — |\n| Gates (A & B) + negative controls | ✅ correct | n/a | — |\n\n**Where it moves the needle most:**\n\n| Eval | With | Base | Why |\n|---|---|---|---|\n| positioning statement (Gate A) | 1.00 | 0.00 | asks the load-bearing questions instead of guessing |\n| honesty probe | 1.00 | 0.25 | placeholders, never fabricated testimonials/stats |\n| homepage hero | 1.00 | 0.25 | drafts outcome-led copy on a stated assumption |\n| competitor analysis | 1.00 | 0.33 | ends in a white-space gap, not just description |\n| pricing-page audit (Gate B) | 1.00 | 0.67 | demands the real asset before auditing |\n| cold email | 0.86 | 0.43 | one idea, one low-friction CTA, relevance-first |\n\n**Negative controls** (\"name my cat\", \"explain DNS\", \"thank-you note to grandma\") correctly did **not** trigger the skill — no false positives. A few weak spots (welcome-sequence, LinkedIn) are tracked openly in [ benchmarks/README.md](/inerrata/brief/blob/main/benchmarks/README.md).\n\n📊 **Browse every output and grade:** open [ evals/review.html](/inerrata/brief/blob/main/evals/review.html) in a browser.\n\n```\n# macOS / Linux\ncurl -fsSL https://raw.githubusercontent.com/inerrata/marketing-tool/main/install.sh | bash\n\n# Windows (PowerShell)\nirm https://raw.githubusercontent.com/inerrata/marketing-tool/main/install.ps1 | iex\n\n### Claude Code (project-specific)\n\n``` bash\ncp -r marketing-tool/_unpacked/marketing .claude/skills/marketing\n```\n\nDownload `marketing.skill`\n\nand upload it under **Settings → Capabilities → Skills** (Pro/Max/Team/Enterprise). The file is just a zip of `_unpacked/marketing/`\n\n— repackage it yourself with any zip tool.\n\n| Module | Covers |\n|---|---|\nCopywriting |\nAds, emails, landing/sales pages, homepages, CTAs, taglines, social posts, microcopy — PAS / AIDA / BAB / 4Ps / FAB, 10-point editing checklist |\nBrand & messaging |\nPositioning, value propositions, messaging hierarchy, brand voice & tone, naming, message testing |\nContent strategy |\nContent pillars, funnel mapping, calendars, content ratios, repurposing, distribution, audits |\nCampaigns & GTM |\nCampaign briefs, concept development, phased launches, go-to-market plans, asset checklists |\nResearch |\nPersonas, ICPs, segmentation, jobs-to-be-done, competitor gap analysis, voice-of-customer |\nSEO |\nSearch intent, keyword strategy, content briefs, on-page optimization, E-E-A-T |\nEmail & lifecycle |\nWelcome / nurture / sales / onboarding / win-back / abandoned-cart sequences, newsletters, deliverability |\nCRO |\nConversion audits, the conversion equation, A/B testing methodology, high-impact fixes |\nMeasurement |\nMetrics by goal, AARRR funnel, CAC / LTV / ROAS formulas, attribution, reporting |\n\nConcrete scenarios, the prompt that triggers them, and what you get back.\n\n| You want to… | Say… | You get |\n|---|---|---|\n| Launch a product with no plan | \"Help me launch my habit-tracking app.\" |\nScoping questions, then a one-page brief + phased pre-/launch/post timeline + asset checklist |\n| First customers from zero | \"Nobody knows my store exists. How do I get my first 100 customers?\" |\nA focused acquisition plan — one channel/motion to start, matched to audience |\n| GTM as a solo founder | \"Launching a paid Notion-template business next month, no audience — what's my plan?\" |\nA sequenced GTM plan (research + channel + content + email + launch), one motion first |\n\n| You want to… | Say… | You get |\n|---|---|---|\n| Cold outreach email | \"Cold email to SaaS founders for my churn-reduction tool.\" |\nSubject + preview, relevance-first opening, one idea, one low-friction CTA |\n| Homepage hero | \"Words for the top of my homepage — a tool that schedules social posts.\" |\nHeadline + subhead + CTA options, outcome-led, on a stated assumption |\n| Subject-line options | \"3 Black Friday subject lines for a coffee subscription.\" |\nThree distinct angles (curiosity / benefit / urgency), labeled, length-checked |\n| Avoid fake proof | \"Write a landing page with impressive testimonials and stats.\" |\nThe page, with marked placeholders + why fabricating proof breaks trust and law |\n\n| You want to… | Say… | You get |\n|---|---|---|\n| Nail positioning | \"Positioning statement for a PM tool aimed at agencies.\" |\n2–3 sharp questions first, then a defensible statement — not a guess |\n| Define brand voice | \"Help me define my brand voice.\" |\nShort discovery, then a voice profile with do/don't dimensions and tone-by-moment |\n| Value proposition | \"Meal-prep for busy parents, ready in 5 min, no ultra-processed — write my value prop.\" |\nAn outcome-led value prop only your brand could say |\n\n| You want to… | Say… | You get |\n|---|---|---|\n| Content calendar | \"Content calendar for a B2B fintech startup.\" |\n3–5 pillars, funnel mapping, value-heavy mix, calendar w/ pillar/stage/format/title/CTA |\n| SEO brief | \"SEO brief targeting 'best crm for nonprofits'.\" |\nIntent → format, title/meta, outline, People-Also-Ask, E-E-A-T, with a real-data flag |\n| Welcome sequence | \"Design a welcome sequence for my newsletter.\" |\n3–5 email flow with timing + one CTA each |\n| Win-back | \"Win-back sequence for users inactive 60 days.\" |\nRe-engagement flow + suppression of non-responders to protect deliverability |\n| Audit a page | \"My pricing page isn't converting — audit it.\" |\nAsks for the real page first, then a structured audit + prioritized fixes |\n| What to measure | \"What should I track for a new paid ads channel?\" |\nPrimary metric (CAC/ROAS), leading indicators, formulas, vanity-metric + attribution caveats |\n| Unit economics | \"Is a $400 CAC good if customers pay $50/mo for ~18 months?\" |\nLTV + LTV:CAC math shown, read against ~3:1, with the caveats |\n\n```\nmarketing/\n├── SKILL.md              ← routing layer, brief-first gates, quality bar\n└── references/\n    ├── copywriting.md      ├── research.md         ├── email-lifecycle.md\n    ├── brand-messaging.md  ├── seo.md              ├── cro.md\n    ├── content-strategy.md ├── campaigns.md        └── measurement.md\n```\n\nAt session start, only the `SKILL.md`\n\n**description** (~100 tokens) is in context. When a marketing task is detected, the full `SKILL.md`\n\nloads and routes to the relevant module(s); unused modules never load. Multi-area requests pull several and synthesize.\n\nTwo hard gates enforce quality:\n\n**Gate A — Strategy foundations**(positioning, value prop, voice, GTM): asks 2–3 sharp questions before writing. These are load-bearing — everything downstream inherits their flaws.**Gate B — Audits / \"improve my X\"**: requests the actual asset first. It won't invent a page's contents and critique its own invention.\n\nEverything else **scales to stakes**: big ambiguous strategy work gets a couple of questions first; a concrete copy deliverable (a hero, an ad, subject lines) is drafted immediately on a stated assumption, with sharpening questions *after* — a draft you can react to beats an interrogation.\n\nBefore producing anything, the skill establishes (or explicitly infers and flags) five things:\n\n**Audience**— who specifically, and what do they believe now?** Goal**— the one action this should drive.** Offer**— what's marketed, and the single most important thing about it.** Proof**— the evidence behind the claims.** Constraint**— channel, length, tone, brand rules.\n\nAudience and goal are never *silently* guessed. When the skill assumes, it says so (\"Assuming [X]; adjust if wrong\") so you can correct it in one line.\n\nChecked against every deliverable before it ships: **specificity** (numbers, not adjectives) · **reader-first** · **one message per piece** · **proof over claims** · **clean mechanics** (active voice, cut filler) · **earn the next line** · **honesty** (no fabricated stats, testimonials, or credentials).\n\nThe skill ships with its own test suite in [ evals/](/inerrata/brief/blob/main/evals):\n\n**26 prompts** covering every module, phrasing variations, both gate checks, multi-module synthesis, an implicit trigger, an honesty probe, and\n\n**negative controls**(prompts that should\n\n*not*trigger it).\n\nThe harness runs each prompt **with and without the skill**, records whether the skill chose to trigger (*available, not forced*), and grades every output against per-prompt assertions — measuring **triggering accuracy** and **output quality** at once.\n\n- 📊 Aggregate + per-eval table →\n`benchmarks/README.md`\n\n- 🧪 Eval set + harness + raw results →\n`evals/`\n\n- 🖥️ Click-through viewer →\n`evals/review.html`\n\nReproduce it in Claude Code with the `skill-creator`\n\nskill:\n\n```\nUse the skill-creator skill to run a full eval pass on ./_unpacked/marketing using ./evals/evals.json.\nRun with-skill and baseline for each prompt, grade against expected_output, and generate the benchmark viewer.\n```\n\n**Specificity over cleverness**— numbers and outcomes, not vague adjectives** Reader-first**— leads with the audience's problem, not the product's features** Brief before output**— never silently guesses on audience or goal** Proof over claims**— every significant claim needs support** One message per piece**— every asset has exactly one job** Draft-first for copy, questions-first for strategy**— match friction to stakes** Honest by default**— never fabricates proof; surfaces advertising/email-compliance considerations\n\n**Does it need an API key or dependencies?** No — plain Markdown in the `SKILL.md`\n\nformat. Nothing to run or install.\n\n**Why does it sometimes ask questions instead of answering?** For strategy foundations and audits, a confident guess is worse than a question. For ordinary copy, it drafts first and asks after.\n\n**Will it invent testimonials or stats if I ask?** No. It uses marked placeholders and explains why fabricated proof breaks trust and advertising law.\n\n**Does it work outside Claude?** Yes — any tool that reads `SKILL.md`\n\n(Cursor, Codex CLI, Gemini CLI, …).\n\n**How is this different from \"be a marketer\"?** Frameworks, gates, and a quality bar make output consistent and grounded — and the eval suite lets you verify it instead of trusting vibes.\n\n**Can I customize it?** Yes — edit the reference modules for your brand rules/voice, and add eval prompts to keep it honest.\n\n- More reference modules (paid-media buying, partnerships/influencer, PR & comms, localization)\n- A brand-profile file the skill reads so output inherits your voice automatically\n- Multi-run (3×) eval pass with variance reporting; fix the tracked weak spots (#15, #8)\n- More worked before/after examples\n\nStar if useful.\n\nPRs and issues welcome.\n\n- Keep\n`SKILL.md`\n\nlean — it's always loaded; depth goes in`references/`\n\n- Add eval prompts in\n`evals/evals.json`\n\nfor any new module (happy path + a phrasing variation + gates) - Run the eval set before and after your change so triggering and quality don't regress\n\nMIT — see [LICENSE](/inerrata/brief/blob/main/LICENSE).", "url": "https://wpnews.pro/news/marketing-skill-for-claude-with-26-evals-20pp-over-baseline", "canonical_source": "https://github.com/inerrata/brief", "published_at": "2026-05-30 19:36:46+00:00", "updated_at": "2026-05-30 20:17:13.461439+00:00", "lang": "en", "topics": ["ai-tools", "ai-products", "large-language-models"], "entities": ["Claude", "Claude Code", "Claude.ai", "Claude API", "Cursor", "Codex CLI", "Gemini CLI"], "alternates": {"html": "https://wpnews.pro/news/marketing-skill-for-claude-with-26-evals-20pp-over-baseline", "markdown": "https://wpnews.pro/news/marketing-skill-for-claude-with-26-evals-20pp-over-baseline.md", "text": "https://wpnews.pro/news/marketing-skill-for-claude-with-26-evals-20pp-over-baseline.txt", "jsonld": "https://wpnews.pro/news/marketing-skill-for-claude-with-26-evals-20pp-over-baseline.jsonld"}}