cd /news/large-language-models/how-to-give-chatgpt-web-scraping-wit… · home topics large-language-models article
[ARTICLE · art-30142] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

How to Give ChatGPT Web Scraping with MCP Connectors (2026)

A developer built a bridge to connect ChatGPT's custom MCP connectors with the local web scraping tool CrawlForge, since ChatGPT requires remote servers. The solution uses a thin FastMCP wrapper that exposes CrawlForge's REST API as remote MCP tools, enabling ChatGPT to perform web searches and fetch content via CrawlForge.

read4 min views1 publishedJun 16, 2026

ChatGPT can now call your own tools through custom MCP connectors — including web scraping. But there is a catch the marketing pages skip: connectors must be remote servers, so a local tool like CrawlForge cannot be pasted in directly. This is the honest version: what is actually possible, why a wrapper is needed, and the exact bridge to build.

ChatGPT supports custom MCP connectors — renamed "apps" in December 2025, so the UI now says "Apps & Connectors." Through Developer mode, you connect an external MCP server and ChatGPT calls its tools mid-conversation, asking you to confirm before any write action. Same Model Context Protocol that powers web scraping in Claude — different client. Developer mode is explicitly a beta.

Per OpenAI's plan table, adding a custom MCP connector is available on Plus, Pro, Business, Enterprise, and Edu — not Free or Go. Full write-action support is rolling out most broadly to Business, Enterprise, and Edu. If you only need ChatGPT to read scraped data, the read-only path below is enough.

This trips people up. A ChatGPT connector must be a remote MCP server reachable over HTTPS (SSE or Streamable HTTP transport). You paste a URL; you do not point it at a command on your machine. That rules out local stdio servers — the kind you install with npx

. To use one, host it publicly or tunnel a local server via ngrok or Cloudflare Tunnel.

There is also a naming rule: ChatGPT's deep research / company-knowledge paths require two read-only tools named search

and fetch

with a specific schema. Full Developer mode allows arbitrary tools, so that constraint applies only to the deep-research path.

CrawlForge ships as a local stdio MCP server (via npx

) plus a REST API at https://www.crawlforge.dev/api/v1/tools/

. Neither is a remote MCP URL, and its tools are named search_web

, fetch_url

, and extract_content

— not the search

/fetch

pair deep research expects. So you cannot paste CrawlForge straight into ChatGPT today. The practical path is a thin remote MCP wrapper — about 30 lines.

FastMCP (Python) is the quickest way to stand up a remote MCP server exposing the search

  • fetch

tools ChatGPT wants. Each calls CrawlForge's REST API with your cf_live_

key in the X-API-Key

header:

import os
import httpx
from fastmcp import FastMCP

mcp = FastMCP("CrawlForge Bridge")
BASE = "https://www.crawlforge.dev/api/v1/tools"
HEADERS = {"X-API-Key": os.environ["CRAWLFORGE_API_KEY"]}

@mcp.tool()
async def search(query: str) -> list[dict]:
    """Search the web. Returns id/title/url results for ChatGPT."""
    async with httpx.AsyncClient(timeout=30) as client:
        r = await client.post(f"{BASE}/search_web", headers=HEADERS,
                              json={"query": query, "limit": 10})
    results = r.json().get("results", [])
    return [{"id": x["link"], "title": x["title"], "url": x["link"]} for x in results]

@mcp.tool()
async def fetch(id: str) -> dict:
    """Fetch full page content by id (the URL) for ChatGPT."""
    async with httpx.AsyncClient(timeout=30) as client:
        r = await client.post(f"{BASE}/extract_content", headers=HEADERS,
                              json={"url": id})
    data = r.json()
    return {"id": id, "title": data.get("title", id),
            "text": data.get("content", ""), "url": id}

if __name__ == "__main__":
    mcp.run(transport="http", host="0.0.0.0", port=8000)

Run it and expose it over HTTPS. For a quick test, tunnel your local port:

pip install fastmcp httpx
export CRAWLFORGE_API_KEY="cf_live_your_key_here"
python server.py
ngrok http 8000

For Developer mode you can skip the search

/fetch

naming and map tools one-to-one to CrawlForge — expose scrape_structured

, stealth_mode

, or deep_research

directly. Same pattern.

/mcp

), name it, choose an auth method.Your search

and fetch

tools appear. In a chat, select the connector and ask ChatGPT to research a topic — it calls search

, then fetch

es the best results through CrawlForge.

Connectors authenticate with none (public) or OAuth — there is no API-key-header option in the UI, which is why the wrapper holds your CrawlForge key server-side. ChatGPT confirms before write actions, and you can inspect each call before approving.

Take OpenAI's warnings seriously: only connect servers you trust. Custom connectors increase risk, including prompt injection, and a model mistake on a write action could destroy or leak data. A read-only scraping bridge is low-risk; lock it down with OAuth before sharing.

If you would rather not host anything, use CrawlForge from code with the OpenAI Agents SDK or Responses API — no remote server required. See CrawlForge with the OpenAI Agents SDK.

── more in #large-language-models 4 stories · sorted by recency
── more on @chatgpt 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/how-to-give-chatgpt-…] indexed:0 read:4min 2026-06-16 ·