{"slug": "what-actually-visits-a-self-hosted-website-in-2026-humans-ai-crawlers-and-6400", "title": "What actually visits a self-hosted website in 2026? Humans, AI crawlers, and 6,400 automated attacks", "summary": "A developer running a self-hosted website on a Raspberry Pi 4B built a public observability dashboard that separates traffic into humans, search engine crawlers, AI retrieval agents, and automated attacks. Over 17 days, the site received 4,523 human visits, 6,409 automated attack attempts, and thousands of crawler requests, with AI agents indexing semantic structure faster than traditional search crawlers. The developer observed that combined machine traffic consistently exceeds human traffic, and AI agents discovered new content faster than Google did.", "body_md": "I run a small self-hosted website on a Raspberry Pi 4B at home.\n\nA few weeks ago I started wondering: who actually visits a website in 2026?\n\nNot just humans. Everything.\n\nSo I built a public observability dashboard on top of GoAccess that separates traffic into four categories: human visitors, search engine crawlers, AI retrieval agents, and automated attacks.\n\nThe numbers from the last 17 days surprised me:\n\n**4,523 human visits\n6,409 automated attack attempts\nThousands of crawler requests from search engines and AI systems**\n\nThe attacks aren't sophisticated. They're mostly automated scanners probing for .env files, WordPress admin panels, and cloud credentials — hitting every public IP on the internet regardless of what's actually running there.\n\nWhat I found more interesting was the AI agent behavior.\n\nAI retrieval agents (GPTBot, ClaudeBot, PerplexityBot, Amazonbot) behave differently from traditional search crawlers. They hit semantic files aggressively — llms.txt, sitemap.xml, JSON-LD structured data — and seem to index the knowledge graph structure of a site rather than individual pages. Within hours of publishing new content, multiple AI crawlers had already visited, apparently triggered by the sitemap update rather than any external link.\n\nA few observations I didn't expect:\n\nCombined machine traffic consistently exceeds human traffic\n\nAI agents discovered new content faster than Google did\n\nThe semantic structure exposed by the site seems almost as important as the content itself\n\nEven a Pi on a residential ISP receives constant automated scans (380+ attempts/day average)\n\nI made the dashboard public because I think the machine side of the web is underobserved.\n\nThe modern web feels less like \"users visiting pages\" and more like a parallel ecosystem of crawlers, AI agents, and automated systems running continuously alongside human visitors.\n\nTwo questions:\n\nAre others tracking AI agents separately from traditional search crawlers?\n\nHas anyone else noticed AI retrieval systems indexing semantic structure (JSON-LD, llms.txt) faster than they index page content?", "url": "https://wpnews.pro/news/what-actually-visits-a-self-hosted-website-in-2026-humans-ai-crawlers-and-6400", "canonical_source": "https://dev.to/tommy2970/what-actually-visits-a-self-hosted-website-in-2026humans-ai-crawlers-and-6400-automated-attacks-d6p", "published_at": "2026-06-25 18:52:00+00:00", "updated_at": "2026-06-25 19:13:18.257243+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "developer-tools", "ai-infrastructure"], "entities": ["Raspberry Pi 4B", "GoAccess", "GPTBot", "ClaudeBot", "PerplexityBot", "Amazonbot", "Google"], "alternates": {"html": "https://wpnews.pro/news/what-actually-visits-a-self-hosted-website-in-2026-humans-ai-crawlers-and-6400", "markdown": "https://wpnews.pro/news/what-actually-visits-a-self-hosted-website-in-2026-humans-ai-crawlers-and-6400.md", "text": "https://wpnews.pro/news/what-actually-visits-a-self-hosted-website-in-2026-humans-ai-crawlers-and-6400.txt", "jsonld": "https://wpnews.pro/news/what-actually-visits-a-self-hosted-website-in-2026-humans-ai-crawlers-and-6400.jsonld"}}