# How I use pluckmd to read blogs with an AI agent

> Source: <https://dev.to/taisei_ide/how-i-use-pluckmd-to-read-blogs-with-an-ai-agent-1jpe>
> Published: 2026-06-02 23:42:20+00:00

I wanted to read blog posts with an LLM in the loop, not just on my own.

The push came from two places. Karpathy's LLM Wiki idea, where the model keeps a folder of markdown notes as you learn a topic. And Thariq's post on how well Claude generates interactive HTML, which is now on the Anthropic blog. Put together, the workflow I wanted looked like this: pull blog articles into markdown, have an agent index them into a wiki, then generate interactive HTML pages to learn from.

Step one was the blocker. Getting clean articles out of a website kept breaking, and every tool wanted a config per site. So I made pluckmd to handle just that part. This post is how I use it. The architecture write-up is separate.

References if you want the background:

```
npx pluckmd download https://example.com/blog -o ./articles
```

That walks the listing page, follows pagination, pulls each article, and writes markdown with frontmatter (title, date, author, tags). On a small blog I get maybe 5 posts saved in a few seconds. No site config, no setup.

If a page is heavy on javascript it quietly switches to a real browser to render it. You don't pick that, it decides.

A lot of the writing I actually care about sits behind a login. Two ways to handle it.

```
pluckmd login https://example.com/login
```

That opens a browser once, you log in by hand, and the session sticks around. After that, normal downloads just work.

Or if you'd rather not hand it credentials at all, open the page in Chrome with the extension installed and run:

```
pluckmd download --active-tab -o ./articles
```

It reads straight from the tab you're already logged into. The CLI itself never reads your cookies.

This is the reason it exists for me. I don't actually run the CLI by hand most of the time. pluckmd ships skills for Claude Code and Codex, so I just talk to the agent and it runs the right commands for me.

The whole learning loop is three messages:

Collect the posts from

[https://example.com/blog]

The agent runs the download and saves everything as markdown into `raw/`

.

Build a wiki from them

It reads the markdown, pulls out the concepts, and links them into wiki notes (works as an Obsidian vault). That's the Karpathy LLM Wiki part, a set of notes the model maintains as I learn.

Generate interactive HTML for this concept

It turns a concept into an interactive HTML page to study from, the Thariq HTML idea. The raw files stay untouched, the wiki and the HTML are things the agent regenerates.

So I never touch flags or paths unless I want to. I describe what I want, the agent drives pluckmd. And if you don't have an LLM key set for the extraction itself, it still works: pluckmd writes out a file describing the page, and the agent reads that and produces the extraction rules. The agent is the brain, the CLI is the hands.

Honestly, not every site cooperates. I hit a couple of layouts where the heuristics couldn't find a clean article pattern and it had to lean on the agent fallback. Infinite scroll feeds are hit or miss depending on how the load-more is wired up. If you try it on something exotic and it flops, that's useful to me.

```
npm install -g pluckmd
```

Repo (MIT): [https://github.com/taisei-ide-0123/pluckmd](https://github.com/taisei-ide-0123/pluckmd)

Curious what people are pointing their agents at. What would you want read into a wiki first?
