{"slug": "show-hn-crawlie-free-open-source-seo-audit-tool-for-humans-and-agents", "title": "Show HN: Crawlie – Free open-source SEO audit tool for humans and agents", "summary": "Spronta released Crawlie, a free open-source SEO audit tool that crawls websites for broken links, redirects, missing metadata, and over 40 SEO and generative-engine optimization checks. The tool includes a CLI, a desktop app, and an MCP server for AI agents, providing plain-English guidance on fixes and scoring technical SEO and AI-search readiness.", "body_md": "**The fast, free, open-source technical SEO + GEO crawler — built for humans and agents.**\n\nCrawl any site for broken links, redirects, missing metadata, and 40+ SEO & Generative-Engine checks — with plain-English guidance on every fix. Runs locally, ships a CLI and an MCP server, and costs nothing.\n\n[Setup](#setup) ·\n[CLI](#how-to-use-cli) ·\n[MCP & agents](#use-with-agents-mcp) ·\n[Use cases](#use-cases) ·\n[Why I built this](#why-i-built-this) ·\n[Desktop app](#desktop-app) ·\n[Checks](#what-it-checks) ·\n[Compare](#how-it-compares) ·\n[Architecture](#architecture)\n\n*by Spronta*\n\n**The easy way — npm** (installs the `crawlie`\n\nCLI and the `crawlie-mcp`\n\nserver):\n\n```\nnpm i -g @spronta/crawlie\n```\n\n**The macOS app** — grab the signed `.dmg`\n\nfrom [Releases](https://github.com/spronta/crawlie/releases).\n\n**From source** — needs [Rust](https://rustup.rs) (engine/CLI/MCP) and, for the desktop app, [pnpm](https://pnpm.io) + Node:\n\n```\ngit clone https://github.com/spronta/crawlie\ncd crawlie\ncargo build --release\n# → target/release/crawlie  and  target/release/crawlie-mcp\n\n# or install onto your PATH:\ncargo install --path crates/crawlie-cli      # installs `crawlie`\ncargo install --path crates/crawlie-mcp      # installs `crawlie-mcp`\n```\n\nHow it ships:theCLI + MCPcomeonlythrough npm — the right native binary installs automatically as a platform package (nothing to download or unblock). Thedesktop appis the only direct download: a Spronta-signed, notarized`.dmg`\n\non[Releases].\n\n```\n# Crawl a whole site (respects robots.txt, seeds from sitemap.xml)\ncrawlie crawl https://example.com --format pretty\n\n# Audit a single page, or a specific set of pages\ncrawlie audit https://example.com/pricing\ncrawlie audit https://example.com/a https://example.com/b\n\n# Save a shareable, self-contained HTML report\ncrawlie crawl https://example.com --format html -o report.html\n\n# Clean JSON on stdout (perfect for piping / scripting / agents)\ncrawlie crawl https://example.com --format json -o report.json\n\n# Learn why any finding matters and how to fix it\ncrawlie explain geo-not-answerable\n```\n\n**Output formats:** `pretty`\n\n(terminal), `json`\n\n(machine-readable, the default), `csv`\n\n(issues), `html`\n\n(shareable file).\n\n**Common flags:**\n\n| Flag | What it does |\n|---|---|\n`--max-pages <n>` |\nCap pages fetched (default 500) |\n`--max-depth <n>` |\nMax click depth from the seed |\n`--concurrency <n>` |\nParallel requests (default 16) |\n`--include <glob>` / `--exclude <glob>` |\nScope the crawl by URL pattern |\n`--no-robots` / `--no-sitemap` / `--no-external` |\nTurn off robots.txt, sitemap seeding, external link checks |\n`--severity error|warning|notice` |\nOnly output findings at/above a level |\n`--save` |\nSave to local report history (`crawlie reports` , `crawlie report <id>` ) |\n`--fail-on error|warning` |\nNon-zero exit code for CI gating |\n\nEvery crawl returns two scores: a **Health** score (technical SEO) and a **GEO** score (AI-search readiness).\n\ncrawlie ships a [Model Context Protocol](https://modelcontextprotocol.io) server so an LLM agent can run a full audit and act on it — no human in the loop. This is the part most SEO tools don't have.\n\nAfter `npm i -g @spronta/crawlie`\n\n, `crawlie-mcp`\n\nis on your `PATH`\n\n. For **Claude Desktop**, edit `claude_desktop_config.json`\n\n:\n\n```\n{\n  \"mcpServers\": {\n    \"crawlie\": {\n      \"command\": \"crawlie-mcp\"\n    }\n  }\n}\n```\n\nFor **Claude Code**:\n\n```\nclaude mcp add crawlie crawlie-mcp\n```\n\n(If you built from source instead, use the absolute path to `target/release/crawlie-mcp`\n\n.)\n\n(Any MCP-compatible client works — Cursor, Cline, your own agent. It speaks JSON-RPC over stdio.)\n\n| Tool | Purpose |\n|---|---|\n`crawl_site` |\nCrawl + audit a whole site (SEO + GEO), returns scores, issues, per-page data |\n`audit_url` |\nAudit a single page |\n`audit_urls` |\nAudit an explicit list of pages |\n`explain_issue` |\nWhy a rule matters + how to fix it |\n`list_rules` |\nThe full catalogue of checks |\n`list_reports` / `get_report` |\nRead saved crawl history |\n\n\"Crawl spronta.com, then give me the top 5 fixes that would most improve my GEO score, with the exact change for each.\"\n\n\"Audit these three landing pages and tell me which is least ready to be cited by AI search, and why.\"\n\n\"Run a crawl with`--fail-on error`\n\nsemantics — are there any broken links or 5xx pages blocking launch?\"\n\nThe agent calls `crawl_site`\n\n, reads the structured issues, and uses `explain_issue`\n\nto turn findings into a prioritized, actionable plan.\n\n**Pre-launch QA**— catch broken links, redirects, 4xx/5xx, and missing metadata before you ship.** GEO optimization**— make pages citable by AI search: structured data, semantic HTML, answer-ready content, authorship/E-E-A-T.** Agent workflows**— let a marketing/SEO agent audit a site and propose fixes autonomously via MCP.** CI/CD gating**—`crawlie crawl … --fail-on error`\n\nin a pipeline to block regressions.**Client reporting**— generate a polished, shareable HTML report in one command.** Auditing AI-generated sites**— verify that the site your agent just built is actually built for search.\n\nI'm **Sean Ryan**. I've spent 6+ years at **Pendo.io** as a Lead Marketing Engineer and lead engineer, and on the side I'm building ** Spronta** — AI for marketers.\n\nWith AI, it's faster than ever to ship a marketing site — but most of what gets generated is slop that was never built to be found. And the tools meant to catch that fall short: most SEO auditors cost money, don't play nicely with your agents, or tell you *what's* wrong without telling you *how to actually rank* for SEO **and** GEO (Generative Engine Optimization — being cited by AI search like ChatGPT, Perplexity, and Google AI Overviews).\n\ncrawlie fixes that. It's free, it's local-first, it's agent-native, and every issue it finds comes with *why it matters* and *how to fix it*.\n\n**If this is useful to you, connect with me on LinkedIn →** — I share what I'm learning building AI for marketers and SEO/GEO tooling, and I'd love to hear how you're using crawlie.\n\nA beautiful Tauri + React app (Geist design, light/dark, seamless window chrome):\n\n```\ncd apps/desktop\npnpm install\npnpm tauri dev          # live native crawls\npnpm dev                # preview the UI in a browser (demo data, no backend)\n```\n\nWhole-site / single-page / URL-list modes, live progress, **Health** & **GEO** score rings, issues with built-in *why-it-matters* guidance, a sortable pages table, a per-page drawer (GEO signals, headers, schema, hreflang…), auto-saved report history, and one-click shareable HTML export.\n\nFirst run, generate the icon set:\n\n`cd src-tauri/icons && python3 generate.py && cd .. && pnpm tauri icon icons/source.png`\n\n*46 rules and counting.*\n\n**Technical SEO** — broken links · 4xx/5xx · redirects & chains · titles & meta descriptions (missing / duplicate / length) · H1s · canonicals · noindex / nofollow / X-Robots-Tag · robots.txt blocking · images missing alt · thin & duplicate content · orphan & deep pages\n\n**Performance & security** — slow responses · large pages · missing compression · HTTPS · mixed content · HSTS\n\n**Mobile, international & social** — viewport · `lang`\n\n· hreflang · Open Graph · Twitter cards · structured data\n\n**GEO — Generative Engine Optimization** — structured data, semantic HTML, answer-readiness, authorship/E-E-A-T, dated content, question-style headings, and extractable blocks, rolled into a per-page **GEO score**.\n\nEvery finding links to plain-English guidance: **why it matters**, **how to fix it**, and **what happens if you ignore it**.\n\ncrawlie |\nScreaming Frog | Sitebulb | |\n|---|---|---|---|\n| Price | Free & open-source |\n£259/yr to unlock | from £13.50/mo |\n| Engine | Rust, async, tiny binary |\nJava (JVM) | .NET |\n| CLI with JSON output | ✅ | partial | ❌ |\nMCP server (agent-native) |\n✅ | ❌ | ❌ |\nGEO — AI/answer-engine audit |\n✅ | ❌ | ❌ |\n\"Why it matters\" built in |\n✅ every issue | ❌ | partial |\n| Shareable HTML report | ✅ | paid | ✅ |\n| Source you can read & extend | ✅ | ❌ | ❌ |\n\n```\ncrates/\n  crawlie-core    # the engine — crawl, audit, score, knowledge base, reports\n  crawlie-cli     # `crawlie` — JSON / pretty / CSV / HTML output\n  crawlie-mcp     # `crawlie-mcp` — Model Context Protocol server (stdio)\napps/\n  desktop         # Tauri v2 + React (Geist) desktop app\n```\n\n`crawlie-core`\n\nhas zero host dependencies — the same audited engine drops straight into a cloud worker (it already targets `wasm32`\n\n). One engine, every surface, identical results.\n\n- Cloud workers (shared Rust core) for scheduled/remote crawls\n- JavaScript rendering for SPA-heavy sites\n- Crawl-to-crawl comparison & regression alerts\n- Internal-link graph visualization\n\nMIT © **Sean Ryan** / [Spronta](https://spronta.com).\n\nBuilt by Sean Ryan — Lead Marketing Engineer at Pendo.io, building AI for marketers at Spronta on the side. [Connect on LinkedIn →](https://linkedin.com/in/sean-exe)\n\nIf crawlie saves you time, a ⭐ on the repo and a hello on LinkedIn mean a lot.", "url": "https://wpnews.pro/news/show-hn-crawlie-free-open-source-seo-audit-tool-for-humans-and-agents", "canonical_source": "https://github.com/spronta/crawlie", "published_at": "2026-06-18 22:54:21+00:00", "updated_at": "2026-06-18 23:00:36.876155+00:00", "lang": "en", "topics": ["developer-tools", "ai-agents"], "entities": ["Spronta", "Crawlie", "Claude Desktop", "Claude Code", "MCP"], "alternates": {"html": "https://wpnews.pro/news/show-hn-crawlie-free-open-source-seo-audit-tool-for-humans-and-agents", "markdown": "https://wpnews.pro/news/show-hn-crawlie-free-open-source-seo-audit-tool-for-humans-and-agents.md", "text": "https://wpnews.pro/news/show-hn-crawlie-free-open-source-seo-audit-tool-for-humans-and-agents.txt", "jsonld": "https://wpnews.pro/news/show-hn-crawlie-free-open-source-seo-audit-tool-for-humans-and-agents.jsonld"}}