{"slug": "how-i-use-pluckmd-to-read-blogs-with-an-ai-agent", "title": "How I use pluckmd to read blogs with an AI agent", "summary": "A developer created pluckmd, a CLI tool that extracts blog articles into markdown without per-site configuration, to enable AI agents to read and index web content. The tool automatically handles JavaScript-heavy pages by switching to a real browser, supports logged-in sessions, and integrates with coding agents like Claude Code and Codex to automate the workflow of collecting articles, building wikis, and generating interactive HTML study pages.", "body_md": "I wanted to read blog posts with an LLM in the loop, not just on my own.\n\nThe push came from two places. Karpathy's LLM Wiki idea, where the model keeps a folder of markdown notes as you learn a topic. And Thariq's post on how well Claude generates interactive HTML, which is now on the Anthropic blog. Put together, the workflow I wanted looked like this: pull blog articles into markdown, have an agent index them into a wiki, then generate interactive HTML pages to learn from.\n\nStep one was the blocker. Getting clean articles out of a website kept breaking, and every tool wanted a config per site. So I made pluckmd to handle just that part. This post is how I use it. The architecture write-up is separate.\n\nReferences if you want the background:\n\n```\nnpx pluckmd download https://example.com/blog -o ./articles\n```\n\nThat walks the listing page, follows pagination, pulls each article, and writes markdown with frontmatter (title, date, author, tags). On a small blog I get maybe 5 posts saved in a few seconds. No site config, no setup.\n\nIf a page is heavy on javascript it quietly switches to a real browser to render it. You don't pick that, it decides.\n\nA lot of the writing I actually care about sits behind a login. Two ways to handle it.\n\n```\npluckmd login https://example.com/login\n```\n\nThat opens a browser once, you log in by hand, and the session sticks around. After that, normal downloads just work.\n\nOr if you'd rather not hand it credentials at all, open the page in Chrome with the extension installed and run:\n\n```\npluckmd download --active-tab -o ./articles\n```\n\nIt reads straight from the tab you're already logged into. The CLI itself never reads your cookies.\n\nThis is the reason it exists for me. I don't actually run the CLI by hand most of the time. pluckmd ships skills for Claude Code and Codex, so I just talk to the agent and it runs the right commands for me.\n\nThe whole learning loop is three messages:\n\nCollect the posts from\n\n[https://example.com/blog]\n\nThe agent runs the download and saves everything as markdown into `raw/`\n\n.\n\nBuild a wiki from them\n\nIt reads the markdown, pulls out the concepts, and links them into wiki notes (works as an Obsidian vault). That's the Karpathy LLM Wiki part, a set of notes the model maintains as I learn.\n\nGenerate interactive HTML for this concept\n\nIt turns a concept into an interactive HTML page to study from, the Thariq HTML idea. The raw files stay untouched, the wiki and the HTML are things the agent regenerates.\n\nSo I never touch flags or paths unless I want to. I describe what I want, the agent drives pluckmd. And if you don't have an LLM key set for the extraction itself, it still works: pluckmd writes out a file describing the page, and the agent reads that and produces the extraction rules. The agent is the brain, the CLI is the hands.\n\nHonestly, not every site cooperates. I hit a couple of layouts where the heuristics couldn't find a clean article pattern and it had to lean on the agent fallback. Infinite scroll feeds are hit or miss depending on how the load-more is wired up. If you try it on something exotic and it flops, that's useful to me.\n\n```\nnpm install -g pluckmd\n```\n\nRepo (MIT): [https://github.com/taisei-ide-0123/pluckmd](https://github.com/taisei-ide-0123/pluckmd)\n\nCurious what people are pointing their agents at. What would you want read into a wiki first?", "url": "https://wpnews.pro/news/how-i-use-pluckmd-to-read-blogs-with-an-ai-agent", "canonical_source": "https://dev.to/taisei_ide/how-i-use-pluckmd-to-read-blogs-with-an-ai-agent-1jpe", "published_at": "2026-06-02 23:42:20+00:00", "updated_at": "2026-06-03 00:12:30.342655+00:00", "lang": "en", "topics": ["ai-tools", "ai-agents", "large-language-models"], "entities": ["pluckmd", "Karpathy", "Thariq", "Claude", "Anthropic"], "alternates": {"html": "https://wpnews.pro/news/how-i-use-pluckmd-to-read-blogs-with-an-ai-agent", "markdown": "https://wpnews.pro/news/how-i-use-pluckmd-to-read-blogs-with-an-ai-agent.md", "text": "https://wpnews.pro/news/how-i-use-pluckmd-to-read-blogs-with-an-ai-agent.txt", "jsonld": "https://wpnews.pro/news/how-i-use-pluckmd-to-read-blogs-with-an-ai-agent.jsonld"}}