Editorial analysis: Changes to default crawl policies shift the economics of web-data collection and model training, with direct implications for dataset sourcing, provenance tracking, and model costs. Per Cloudflare's blog and reporting by The Register and CJR, Cloudflare announced it will default to blocking mixed-use crawlers from accessing ad-supported customer websites, and will offer managed robots.txt controls plus an option to restrict crawls to monetized pages (Cloudflare blog). Reporting by CJR and the Transparency Coalition says Cloudflare is testing a "pay-per-crawl" feature that would let publishers charge AI companies for crawl access. Cloudflare-hosted traffic reaches roughly 20 percent of the web, CJR reports. Per Cloudflare's blog, crawl-to-referral ratios in June 2025 were roughly Google 14:1, OpenAI 1,700:1, and Anthropic 73,000:1, figures Cloudflare uses to argue the historic crawl-for-traffic bargain has broken down.
Cisco is betting its whole workforce on AI agents while it cuts jobs