{"slug": "crawlforge-v4-2-2-new-cli-3-tools-for-local-ai-scraping", "title": "CrawlForge v4.2.2: New CLI + 3 Tools for Local AI Scraping", "summary": "CrawlForge v4.2.2 introduces a new command-line interface (CLI) and three new tools, shifting its focus toward local, API-key-free web scraping for AI. The new tools include `extract_with_llm`, which defaults to local Ollama models for structured data extraction, and `scrape_template`, which provides pre-built scrapers for sites like Amazon and GitHub. The release also adds a tool to list local Ollama models, with all new features accessible via the CLI and sharing the same credit system as the existing API.", "body_md": "Today we are shipping **CrawlForge v4.2.2**, our biggest release since launch. It brings three new tools, a standalone command-line interface, and a quiet shift in how we think about web scraping for AI: most of it should run locally, on your own machine, without API keys.\n\nThis post is the umbrella for everything in 4.2.2. Three deep-dive guides follow in the next nine days.\n\n## Table of Contents\n\n[What Shipped](#what-shipped)[The New CrawlForge CLI](#the-new-crawlforge-cli)[Extract With LLM: Local AI Extraction](#extract-with-llm-local-ai-extraction)[Scrape Template: Ten Sites, One Call](#scrape-template-ten-sites-one-call)[list_ollama_models: Free Model Discovery](#list_ollama_models-free-model-discovery)[Old Workflow vs v4.2.2 Workflow](#old-workflow-vs-v422-workflow)[Credit Costs](#credit-costs)[How to Upgrade](#how-to-upgrade)[What Is Next](#what-is-next)\n\n## What Shipped\n\nv4.2.2 adds four things:\n\n-\n**@crawlforge/cli**-- a standalone command-line tool exposing all 23 CrawlForge tools to your shell. No MCP client required. -\n**extract_with_llm**-- LLM-powered structured extraction that defaults to local Ollama. No external API key needed. -\n**scrape_template**-- pre-built scrapers for Amazon, LinkedIn, GitHub, YouTube, Reddit, Hacker News, Stack Overflow, npm, Product Hunt, and Twitter/X. -\n**list_ollama_models**-- a free discovery tool that lists models on your local Ollama instance.\n\nTool count goes from 20 to 23. The CLI is brand new -- it is not a tool, it is a delivery channel.\n\n```\n+----------------+       +-------------------+       +----------------+\n|   Your Shell   | <-->  |  @crawlforge/cli  | <-->  |  CrawlForge    |\n|   (cron, CI)   |       |  (JSON in/out)    |       |   API + Tools  |\n+----------------+       +-------------------+       +----------------+\n                                  ^\n                          No MCP handshake.\n                          Just HTTPS + stdout.\n```\n\n## The New CrawlForge CLI\n\nThe CLI is the shortest path from intent to scraped data. You install it once, set an environment variable, and every CrawlForge tool becomes a command:\n\n```\nnpm install -g @crawlforge/cli\nexport CRAWLFORGE_API_KEY=\"cf_live_your_key_here\"\n\ncrawlforge scrape https://example.com\ncrawlforge search \"best MCP servers 2026\"\ncrawlforge research \"AI agent frameworks\" --depth 3\n```\n\nWhy does this matter? Because MCP is great for AI agents, but a lot of scraping work is not an AI agent task. It is a cron job. A CI step. A one-off pull from your terminal. For that, you want JSON on stdout that pipes into jq, not a JSON-RPC handshake.\n\nMCP is optimized for AI agents picking tools dynamically. The CLI is optimized for All three paths hit the same backend, share the same credit balance, and use the same API key.## Why have a CLI when MCP already exists?\n\n**humans typing commands** and **scripts piping JSON**. Different shapes for different jobs:\n\nWorkflow\nBest fit\nClaude/Cursor agent\nMCP\nCron job\nCLI\nGitHub Actions step\nCLI\nOne-off terminal\nCLI\nServer in a loop\nRaw API\n\nRead the [complete CrawlForge CLI guide](https://www.crawlforge.dev/blog/web-scraping-cli-complete-guide) for the full command reference and real-world workflows.\n\n## Extract With LLM: Local AI Extraction\n\n`extract_with_llm`\n\nis structured extraction powered by a language model. You hand it a URL and a schema, it gives you back JSON. The new part is that it defaults to **local Ollama** rather than calling OpenAI or Anthropic.\n\n```\n{\n  \"url\": \"https://news.ycombinator.com/item?id=123456\",\n  \"schema\": {\n    \"type\": \"object\",\n    \"properties\": {\n      \"title\":    { \"type\": \"string\" },\n      \"points\":   { \"type\": \"number\" },\n      \"comments\": { \"type\": \"number\" }\n    }\n  },\n  \"provider\": \"ollama\",\n  \"model\": \"llama3.1:8b\"\n}\n```\n\nThree things follow from the local-first default:\n\n-\n**No third-party API costs.** The LLM is free. You only pay 3 CrawlForge credits per extraction. -\n**No data leaving your machine.** Scraped content stays on localhost. -\n**No new API key to manage.** If Ollama is installed, you are done.\n\nLocal models are great for predictable schemas (titles, prices, counts, ratings). For long-form reasoning -- summarizing a 10,000-word article, classifying nuanced sentiment, extracting fields that require world knowledge -- a frontier model still wins. Switch providers with one parameter: \n\n```\ncrawlforge extract https://example.com \\\n  --provider anthropic \\\n  --model claude-sonnet-4-6\n```\n\n You pay the provider's per-token cost plus 3 CrawlForge credits. Same schema, same output shape.## When to still use OpenAI or Anthropic\n\nDetailed guide: [extract data with local LLMs](https://www.crawlforge.dev/blog/extract-data-with-local-llms-ollama).\n\n## Scrape Template: Ten Sites, One Call\n\n`scrape_template`\n\nis for the long tail of scraping requests that all look the same: \"get me product data from Amazon\", \"get me a GitHub repo's metadata\", \"get me the top posts on Hacker News today\". You should not need to write CSS selectors for these. We did it once, we maintain it, you call it.\n\n```\ncrawlforge template amazon --url \"https://www.amazon.com/dp/B0CHX1W1XY\"\ncrawlforge template github --url \"https://github.com/anthropics/anthropic-sdk-python\"\ncrawlforge template hackernews --top 10\n```\n\nTen templates ship in this release:\n\n| Template | What it returns | Credits |\n|---|---|---|\n`amazon` |\nProduct title, price, rating, reviews, images | 1 |\n`linkedin` |\nProfile name, headline, experience, skills | 1 |\n`github` |\nRepo metadata, stars, languages, README | 1 |\n`youtube` |\nVideo title, views, channel, transcript | 1 |\n`reddit` |\nPost title, score, comments, top replies | 1 |\n`hackernews` |\nStory title, points, URL, comments | 1 |\n`stackoverflow` |\nQuestion, answers, accepted, vote counts | 1 |\n`npm` |\nPackage metadata, weekly downloads, versions | 1 |\n`producthunt` |\nProduct name, tagline, upvotes, makers | 1 |\n`tweet` |\nTweet text, author, engagement, replies | 1 |\n\nFull walkthrough with code: [scrape Amazon, LinkedIn, and GitHub with one tool](https://www.crawlforge.dev/blog/scrape-amazon-linkedin-github-templates).\n\n## list_ollama_models: Free Model Discovery\n\nMost useful as a sanity-check before running `extract_with_llm`\n\n. Lists every model on your local Ollama instance with name, size, and modified date.\n\n```\ncrawlforge extract --list-ollama-models\n```\n\nCosts **zero credits**. It does no scraping, no LLM call -- it just queries Ollama's local API on `127.0.0.1:11434`\n\nand returns the result. If you have ever wondered which model you actually have installed, this is the answer.\n\n## Old Workflow vs v4.2.2 Workflow\n\n| Task | Pre-4.2.2 | v4.2.2 |\n|---|---|---|\n| Scrape from your terminal | curl + custom parser, or boot a Node REPL | `crawlforge scrape <url>` |\n| Extract structured data with LLM |\n`extract_structured` (CSS selectors) or roll your own with Puppeteer + OpenAI |\n`extract_with_llm` (Ollama default) |\n| Scrape Amazon, LinkedIn, GitHub |\n`scrape_structured` with hand-maintained selectors |\n`scrape_template` (we maintain selectors) |\n| Run scraping in CI/cron | curl with API key in headers |\n`crawlforge <cmd>` with env var |\n\n## Credit Costs\n\nThe three new tools follow our existing credit-cost model. No surprises:\n\n| Tool | Credits | Why |\n|---|---|---|\n`list_ollama_models` |\n0 | Free discovery helper |\n`scrape_template` |\n1 | Single page, pre-built schema |\n`extract_with_llm` |\n3 | LLM inference (provider-agnostic) |\n\nThe CLI itself is free. It uses your existing API key and bills against your normal credit balance.\n\n## How to Upgrade\n\nExisting users do not need to do anything. The new tools are live on all plans -- Free, Hobby, Professional, and Business -- and show up automatically in your MCP client.\n\n```\nnpm install -g @crawlforge/cli\nexport CRAWLFORGE_API_KEY=\"cf_live_...\"\ncrawlforge --help\n```\n\n Add the ## Install the CLI\n\n`export`\n\nline to your shell profile (`~/.zshrc`\n\n, `~/.bashrc`\n\n) so it persists. For CI, set `CRAWLFORGE_API_KEY`\n\nas a repository secret.\n\n```\n# 1. Install Ollama (one-time)\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# 2. Pull a model (llama3.1:8b is a good start)\nollama pull llama3.1:8b\n\n# 3. Run extraction through CrawlForge\ncrawlforge extract https://example.com \\\n  --provider ollama \\\n  --model llama3.1:8b\n```\n\n The first run pulls about 5 GB. After that, every extraction is local, free, and offline-capable.## Try Ollama-powered extraction\n\n## What Is Next\n\nWe are working on three things for 4.3:\n\n-\n**More templates**-- Etsy, eBay, TikTok, Instagram, Google Maps. Send us requests on[Discord](https://discord.gg/crawlforge). -\n**Webhook delivery for batch_scrape**-- get results pushed to your endpoint when long-running jobs complete. -\n**CLI watch mode**--`crawlforge track --watch`\n\nfor live diffs on monitored pages.\n\n**Ready to try the new tools?** Free tier still includes 1,000 credits and no credit card.\n\nOr jump straight into the deep dives:", "url": "https://wpnews.pro/news/crawlforge-v4-2-2-new-cli-3-tools-for-local-ai-scraping", "canonical_source": "https://dev.to/simon_crawlforge_dev/crawlforge-v422-new-cli-3-tools-for-local-ai-scraping-5954", "published_at": "2026-05-18 23:21:46+00:00", "updated_at": "2026-05-19 00:04:14.235173+00:00", "lang": "en", "topics": ["developer-tools", "open-source", "artificial-intelligence", "data", "products"], "entities": ["CrawlForge", "CrawlForge v4.2.2", "@crawlforge/cli", "MCP"], "alternates": {"html": "https://wpnews.pro/news/crawlforge-v4-2-2-new-cli-3-tools-for-local-ai-scraping", "markdown": "https://wpnews.pro/news/crawlforge-v4-2-2-new-cli-3-tools-for-local-ai-scraping.md", "text": "https://wpnews.pro/news/crawlforge-v4-2-2-new-cli-3-tools-for-local-ai-scraping.txt", "jsonld": "https://wpnews.pro/news/crawlforge-v4-2-2-new-cli-3-tools-for-local-ai-scraping.jsonld"}}