{"slug": "ai-crawlers-are-inflating-your-view-counts", "title": "AI crawlers are inflating your view counts", "summary": "AI crawlers are inflating web view counts by making hundreds of thousands of requests per day, skewing analytics for human engagement. A developer fixed the issue by moving tracking to client-side JavaScript and adding robots.txt and bot detection guards, but crawlers later found and hit the tracking endpoint directly.", "body_md": "Your most-viewed page might be one no human has ever opened. That is what **AI crawlers** have done to view tracking in 2026.\n\nI ran into this problem on a production app that needed engagement tracking. The first version tracked everything server-side, the way Rails apps have done analytics for years. It broke within a day.\n\n##\n[\nThe problem: crawlers inflate every count\n](#the-problem-crawlers-inflate-every-count)\n\nWe used [Ahoy](https://github.com/ankane/ahoy) for tracking. Each controller action called `ahoy.track`\n\nwhile rendering the page, and every event rolled up into a denormalized counter column with `counter_culture`\n\n.\n\nThe issue is that server-side tracking fires on every request, including bots. AI crawlers like Meta-ExternalAgent, Bytespider, and Baiduspider were making roughly 100,000 requests per day. They were not attacking the site, just reading to feed training pipelines.\n\nAhoy has bot detection built in. It uses the `device_detector`\n\ngem to check user agents and skips known bots. That list catches Googlebot and older crawlers, but it misses the new wave of AI crawlers. As a result, every one of those requests created an `Ahoy::Event`\n\nrow and incremented the corresponding counters.\n\nOur view counts were not measuring human interest. They were measuring how hungry the scrapers were that week.\n\n##\n[\nFix one: require JavaScript\n](#fix-one-require-javascript)\n\nChasing user agent strings is a losing game. New crawlers appear faster than blocklists update. But there is one thing AI crawlers reliably do not do, and that is execute JavaScript.\n\nSo we moved view tracking out of the controllers. Pages declare what is trackable as a data attribute, and a small Stimulus controller fires a beacon after the page loads.\n\n```\nconnect() {\n  if (this.element.dataset.viewTrackerFired === \"true\") return\n  this.element.dataset.viewTrackerFired = \"true\"\n\n  const fire = () => this.fire()\n  if (\"requestIdleCallback\" in window) {\n    requestIdleCallback(fire, { timeout: 2000 })\n  } else {\n    setTimeout(fire, 500)\n  }\n}\n```\n\nA few details mattered here:\n\n`requestIdleCallback`\n\ndefers the beacon until the browser is idle, so tracking never competes with rendering. The 2-second timeout guarantees it still fires on busy pages.`keepalive: true`\n\non the fetch lets the request survive the user navigating away immediately.- The fired flag guards against Turbo reconnecting the controller and double-counting.\n\nCrawlers fetch the HTML and move on. Real browsers run the beacon and get counted. View counts dropped sharply the day this deployed. That was the fix landing, not a regression.\n\n##\n[\nFix two: the bots found the beacon\n](#fix-two-the-bots-found-the-beacon)\n\nThree days later, the tracking endpoint `/track/events`\n\nwas the most-crawled path on the site. Crawlers do not execute JavaScript, but they do parse it. The endpoint URL sits in the markup as a data attribute, so the scrapers extracted it and started requesting it directly.\n\nNone of those requests created events, but they still burned through the full Rails stack for nothing. The fix was two cheap layers.\n\nFirst, robots.txt for the well-behaved bots:\n\n```\nDisallow: /track/\n```\n\nSecond, a guard in the controller for everyone else:\n\n```\nclass TrackingEventsController < ApplicationController\n  before_action :reject_bots\n\n  private\n\n  def reject_bots\n    head :no_content if DeviceDetector.new(request.user_agent).bot?\n  end\nend\n```\n\nAny request with a bot user agent gets a 204 before the action runs. No parsing, no resource lookups, no database work. The well-behaved crawlers respect robots.txt and never arrive, and the rest get the cheapest possible response.\n\n##\n[\nThe takeaway\n](#the-takeaway)\n\nServer-side analytics was built for a web that no longer exists. In 2026, a meaningful share of your traffic comes from AI crawlers, so counting views on the server measures scraper appetite, not audience.\n\nThe defense is not one clever trick. It is stacked cheap layers: robots.txt for the bots that ask permission, a user agent check that returns early for the ones that announce themselves, and a JavaScript beacon for the bots that do neither.\n\nCheck your own numbers. If your view counts have never had a suspicious cliff in them, the bot tax is probably still baked in.", "url": "https://wpnews.pro/news/ai-crawlers-are-inflating-your-view-counts", "canonical_source": "https://feed.thoughtbot.com/link/24077/17361774/ai-crawlers-are-inflating-your-view-counts", "published_at": "2026-06-16 00:00:00+00:00", "updated_at": "2026-06-16 00:22:14.124528+00:00", "lang": "en", "topics": ["developer-tools"], "entities": ["Ahoy", "Meta-ExternalAgent", "Bytespider", "Baiduspider", "Stimulus", "Rails"], "alternates": {"html": "https://wpnews.pro/news/ai-crawlers-are-inflating-your-view-counts", "markdown": "https://wpnews.pro/news/ai-crawlers-are-inflating-your-view-counts.md", "text": "https://wpnews.pro/news/ai-crawlers-are-inflating-your-view-counts.txt", "jsonld": "https://wpnews.pro/news/ai-crawlers-are-inflating-your-view-counts.jsonld"}}