# I Built a Python Pipeline That Drafts Affiliate Articles Locally with Claude — Here's the Code, the 41-Second Run, and the Bug T

> Source: <https://dev.to/_7fb6011b57d383122b5a/i-built-a-python-pipeline-that-drafts-affiliate-articles-locally-with-claude-heres-the-code-the-50ib>
> Published: 2026-06-04 22:41:53+00:00

If you read this, you'll be able to run a small Python pipeline on your own laptop that: (1) generates a draft article from a topic + a keyword list, (2) injects your affiliate links **only where they're contextually relevant**, and (3) refuses to save anything where the title doesn't match the body. No SaaS, no cron server — just `python pipeline.py "Laravel N+1"`

and a Markdown file lands in `out/`

.

I run this every morning. Over 6 weeks it produced 17 drafts; my honest conversion is still low (think *single-digit clicks*, not "月10万"), but the **machinery** works and the failure modes are interesting. This is the build log, not a get-rich post.

The whole thing is ~180 lines. The non-obvious design decision: **the LLM never touches your affiliate links.** Claude writes prose; a deterministic Python step does link insertion. Why? Because the first version let the model embed links, and Claude happily invented `https://amzn.to/laravel-pro`

— a URL that does not exist. Hallucinated affiliate links are worse than no links: they leak trust and earn nothing.

So the contract is:

`claude-opus-4-8`

via the Anthropic SDK)`{title, sections[], keywords_used[]}`

.Here is the generation core. It uses the Anthropic Messages API with a forced JSON shape via a tool definition — that's the reliable way to get structured output, far better than "please return JSON" in the prompt.

``` python
# pipeline.py  (Python 3.11)
import json, os, re, sys, pathlib
from anthropic import Anthropic

client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])
MODEL = "claude-opus-4-8"

ARTICLE_TOOL = {
    "name": "emit_article",
    "description": "Return the drafted technical article as structured data.",
    "input_schema": {
        "type": "object",
        "properties": {
            "title": {"type": "string"},
            "sections": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "h2": {"type": "string"},
                        "body_md": {"type": "string"},
                    },
                    "required": ["h2", "body_md"],
                },
            },
            "keywords_used": {"type": "array", "items": {"type": "string"}},
        },
        "required": ["title", "sections", "keywords_used"],
    },
}

def draft(topic: str, keywords: list[str]) -> dict:
    prompt = (
        f"You are a senior backend engineer. Write a hands-on article on: {topic}.\n"
        f"Each H2 must contain at least one of these search keywords: {keywords}.\n"
        "Include real numbers and one runnable code block per section. "
        "Do NOT include any URLs or affiliate links — leave linking to the pipeline."
    )
    resp = client.messages.create(
        model=MODEL,
        max_tokens=4000,
        tools=[ARTICLE_TOOL],
        tool_choice={"type": "tool", "name": "emit_article"},
        messages=[{"role": "user", "content": prompt}],
    )
    for block in resp.content:
        if block.type == "tool_use":
            return block.input
    raise RuntimeError("model did not call emit_article")
```

`tool_choice`

forcing `emit_article`

is the part that took me three tries to get right. Without it, ~1 in 8 runs returned a chatty text block ("Sure! Here's your article...") and my `json.loads`

blew up. Forcing the tool dropped that failure rate to zero across the last 60 runs.

This is the boring part that actually protects revenue. I keep a hand-written table of links I'm *actually* registered for (A8.net, an affiliate-enabled book retailer, etc.), each with a list of trigger keywords. Python inserts a link only when a section genuinely discusses that topic, and never more than one per ~400 words — because a wall of affiliate links is the fastest way to get a reader to bounce and an editor to flag spam.

```
# links.py
LINK_TABLE = [
    {
        "triggers": ["n+1", "eloquent", "query log", "eager loading"],
        "anchor": "a practical Laravel performance book",
        "url": "https://example-a8-link/laravel-perf",  # your real A8 tracking URL
    },
    {
        "triggers": ["new nisa", "index fund", "brokerage"],
        "anchor": "open a tsumitate NISA account",
        "url": "https://example-a8-link/nisa",
    },
]

def inject_links(body_md: str) -> tuple[str, int]:
    words = max(len(body_md.split()), 1)
    budget = max(1, words // 400)          # at most 1 link per 400 words
    low = body_md.lower()
    inserted = 0
    for link in LINK_TABLE:
        if inserted >= budget:
            break
        if any(t in low for t in link["triggers"]):
            md_link = f"[{link['anchor']}]({link['url']})"
            body_md += f"\n\n> 📚 Related: {md_link}"
            inserted += 1
    return body_md, inserted
```

Measured behavior on my last 17 drafts: average **1.3 links per article**, and 4 articles got **zero** links because no section matched a trigger — which is exactly what I want. An off-topic affiliate link converts at ~0% and costs you credibility. Letting the budget go to zero is a feature.

Here's the failure story. Early on, my title prompt and my body prompt were two separate Claude calls. On three mornings the title said *"Laravel Eloquent N+1"* while the body had drifted into *MySQL index design* — because the second call had no memory of the first. I didn't notice until a reader DMed me "the title is lying." Mortifying.

Fix: one call returns both (already done above), **plus** a deterministic gate that runs before anything is written to disk. If fewer than 2 meaningful title tokens appear in the body, the draft is rejected — no file, non-zero exit code, loud message.

```
STOP = {"the", "a", "to", "in", "with", "and", "of", "for", "how", "i"}

def title_matches_body(title: str, body: str) -> bool:
    toks = [t for t in re.findall(r"[a-z0-9+]+", title.lower()) if t not in STOP]
    body_low = body.lower()
    hits = sum(1 for t in toks if t in body_low)
    return hits >= 2          # require 2+ real title tokens in the body

def build(topic: str, keywords: list[str]) -> pathlib.Path:
    art = draft(topic, keywords)
    parts = [f"# {art['title']}\n"]
    for sec in art["sections"]:
        body, n = inject_links(sec["body_md"])
        parts.append(f"## {sec['h2']}\n\n{body}\n")
    full = "\n".join(parts)

    if not title_matches_body(art["title"], full):
        raise SystemExit(f"REJECTED: title/body drift -> {art['title']!r}")

    slug = re.sub(r"[^a-z0-9]+", "-", art["title"].lower()).strip("-")[:60]
    out = pathlib.Path("out") / f"{slug}.md"
    out.parent.mkdir(exist_ok=True)
    out.write_text(full, encoding="utf-8")
    return out

if __name__ == "__main__":
    topic = sys.argv[1] if len(sys.argv) > 1 else "Laravel Eloquent N+1"
    kws = ["eloquent", "whereHas", "eager loading", "query log"]
    path = build(topic, kws)
    print(f"wrote {path}")
```

Since adding `title_matches_body`

, the gate has rejected **2 of the last 31 runs** — both genuine drifts where Claude wandered off-topic in a long section. Two prevented embarrassments for the cost of a 5-line function. The `>= 2`

threshold matters: at `>= 1`

, a single accidental token like "the" (before I added the stoplist) passed garbage; at `>= 3`

, legitimate short titles got rejected. Two is the sweet spot for my title lengths.

On an M-class / Ryzen laptop the bottleneck is entirely the API round-trip, not Python. A full run breaks down as:

`max_tokens=4000`

, usually ~3,800 used): I deliberately do **not** fan out 10 topics in parallel. One article a day, hand-reviewed before posting, keeps quality up and keeps me off platform spam filters — which is the real constraint, not throughput. The machine *could* do 30 in 20 minutes; that's exactly the trap that gets accounts flagged.

The local script is the unit; GitHub Actions is just a free cron that runs it and commits the result. The keys live in repo secrets, never in the file. Cost note: at current Opus pricing, ~3,800 output tokens is a few cents per run — call it the price of a vending-machine coffee per **month**, not per article.

```
# .github/workflows/daily.yml
name: daily-draft
on:
  schedule:
    - cron: "0 22 * * *"   # 22:00 UTC = 07:00 JST
  workflow_dispatch: {}
jobs:
  draft:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: "3.11" }
      - run: pip install anthropic
      - env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: python pipeline.py "Laravel Eloquent N+1 query optimization"
      - run: |
          git config user.name "draft-bot"
          git config user.email "bot@users.noreply.github.com"
          git add out/ && git commit -m "daily draft" || echo "nothing to commit"
          git push
```

The `|| echo "nothing to commit"`

line is load-bearing: when the validation gate rejects a draft, there's no file, `git commit`

would exit non-zero, and the whole Action would go red for no good reason. This keeps a *rejection* (correct behavior) from looking like a *failure*.

Blunt truth from 6 weeks: the pipeline is the easy 20%. **Distribution is the other 80%**, and code can't fake it. My drafts that got read were the ones where the topic matched the platform's audience (concrete Laravel/Python implementation posts on a dev-heavy platform), not the generic ones. The automation's real value isn't "passive income" — it's removing the 40-minute cold-start of staring at a blank editor, so I'll actually publish 5 days a week instead of 1.

If you build this, steal three ideas specifically: **(1)** force structured output with `tool_choice`

so you never parse free text; **(2)** keep affiliate links in deterministic Python, never in the prompt, so the model can't hallucinate a payout URL; **(3)** add a title↔body gate before any write — it's the cheapest insurance against shipping something that lies to your readers.

The full ~180-line version, plus the link table format, is the same shape as above — copy the three functions and you have a working draft generator today. If you want to go deeper on the query-optimization side that these drafts target, [a practical Laravel performance book](https://example-a8-link/laravel-perf) is the one I keep open while editing.
