What actually visits a self-hosted website in 2026? Humans, AI crawlers, and 6,400 automated attacks

wpnews.pro

cd /news/artificial-intelligence/what-actually-visits-a-self-hosted-w… · home › topics › artificial-intelligence › article

[ARTICLE · art-39813] src=dev.to ↗ pub=2026-06-25T18:52Z topic=artificial-intelligence verified=true sentiment=· neutral

What actually visits a self-hosted website in 2026? Humans, AI crawlers, and 6,400 automated attacks

A developer running a self-hosted website on a Raspberry Pi 4B built a public observability dashboard that separates traffic into humans, search engine crawlers, AI retrieval agents, and automated attacks. Over 17 days, the site received 4,523 human visits, 6,409 automated attack attempts, and thousands of crawler requests, with AI agents indexing semantic structure faster than traditional search crawlers. The developer observed that combined machine traffic consistently exceeds human traffic, and AI agents discovered new content faster than Google did.

read2 min views1 publishedJun 25, 2026

I run a small self-hosted website on a Raspberry Pi 4B at home.

A few weeks ago I started wondering: who actually visits a website in 2026?

Not just humans. Everything.

So I built a public observability dashboard on top of GoAccess that separates traffic into four categories: human visitors, search engine crawlers, AI retrieval agents, and automated attacks.

The numbers from the last 17 days surprised me:

4,523 human visits 6,409 automated attack attempts Thousands of crawler requests from search engines and AI systems

The attacks aren't sophisticated. They're mostly automated scanners probing for .env files, WordPress admin panels, and cloud credentials — hitting every public IP on the internet regardless of what's actually running there.

What I found more interesting was the AI agent behavior.

AI retrieval agents (GPTBot, ClaudeBot, PerplexityBot, Amazonbot) behave differently from traditional search crawlers. They hit semantic files aggressively — llms.txt, sitemap.xml, JSON-LD structured data — and seem to index the knowledge graph structure of a site rather than individual pages. Within hours of publishing new content, multiple AI crawlers had already visited, apparently triggered by the sitemap update rather than any external link.

A few observations I didn't expect:

Combined machine traffic consistently exceeds human traffic

AI agents discovered new content faster than Google did

The semantic structure exposed by the site seems almost as important as the content itself

Even a Pi on a residential ISP receives constant automated scans (380+ attempts/day average)

I made the dashboard public because I think the machine side of the web is underobserved.

The modern web feels less like "users visiting pages" and more like a parallel ecosystem of crawlers, AI agents, and automated systems running continuously alongside human visitors.

Two questions:

Are others tracking AI agents separately from traditional search crawlers?

Has anyone else noticed AI retrieval systems indexing semantic structure (JSON-LD, llms.txt) faster than they index page content?

source & further reading

dev.to — original article Building an AI Filmmaking API Taught Us That Great Endpoints Don't Create Great Films Day 10 of building an AI agent that controls a phone. Project Log #10: I'm Ditching Screenshots. Here's Why.

~/api · this article 200

$curl api.wpnews.pro/v1/news/what-actually-visits-a-s…

Read original on dev.to → dev.to/tommy2970/what-actually-visits-a-self-hos…

mentioned entities

Raspberry Pi 4B

GoAccess

GPTBot

ClaudeBot

PerplexityBot

Amazonbot

Google

metadata

slugwhat-actually-visits-a-self-hosted-website-in-2026-humans-ai-crawlers-and-6400

topic#artificial-intelligence

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevRevolut enhances fraud detection…

next →Softbank CEO Who’s Invested $64 …

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 17 Jun · #artificial-intelligence

AI Bots Are Reading Your Site. Here's How to Make Them Sell You.

dev.to · 23 May · #artificial-intelligence

llms.txt vs robots.txt vs ai.txt: The Developer's Cheat Sheet

dev.to · 25 Jun · #artificial-intelligence

I Let My AI Agent Build a Bedrock RAG Knowledge Base, Here Are the 2 Mistakes the AWS Agent Toolkit Caught

dev.to · 25 Jun · #artificial-intelligence

Southwest Airlines Embraces Cloud and AI Architecture. Are They Setting a New Standard for the Industry?

── more on @raspberry pi 4b 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required