{"slug": "ai-bots-are-reading-your-site-here-s-how-to-make-them-sell-you", "title": "AI Bots Are Reading Your Site. Here's How to Make Them Sell You.", "summary": "A developer discovered that AI crawlers like GPTBot, ClaudeBot, and PerplexityBot were methodically reading their technical blog. Instead of blocking them, they realized these bots distribute content to AI assistants answering user questions, with PerplexityBot providing direct referral traffic via citations. The developer advocates for Generative Engine Optimization (GEO), including creating an llms.txt file to signal intent to AI systems and structure content for better representation in AI-generated answers.", "body_md": "I was going through my server logs last month when I noticed something I'd been scrolling past for weeks. Buried in the bot traffic were names I vaguely recognised: `GPTBot`\n\n. `ClaudeBot`\n\n. `meta-externalagent`\n\n. `PerplexityBot`\n\n. Multiple visits daily, methodically working through different pages of my technical blog.\n\nThe reflex most developers have at this point including me, initially is to block them. There's an entire category of articles recommending exactly that: add a few directives to `robots.txt`\n\n, protect your content from being consumed by machines, done. I had the file open. I'd typed `User-agent: GPTBot`\n\nand had `Disallow: /`\n\nready to go.\n\nThen I stopped and asked a question I hadn't thought to ask: *what actually happens after these bots finish reading?* They don't discard the content. They use it. Every day, millions of people ask AI assistants technical questions, and those answers are built from content exactly like mine. The bots weren't extracting value from me. **They were distributing me.** The problem wasn't that they were reading my posts. The problem was that nobody knew the answers came from me.\n\nThe label \"AI crawler\" covers very different things. There is a hard split between:\n\n`GPTBot`\n\n, `ClaudeBot`\n\n, `CCBot`\n\nthat consume your content quietly for model training and never credit you when they use it.`PerplexityBot`\n\nthat use your content to answer real questions in real time and cite the source inside the answer.| Crawler Type | Examples | What They Do | Traffic Sent |\n|---|---|---|---|\n| Training Crawlers | GPTBot, ClaudeBot, CCBot | Collect for model training, never attribute | None |\n| Search Crawlers | Googlebot, Bingbot | Index for SERPs | Indirect |\n| Answer Engines | PerplexityBot, YouBot | Answer live questions, cite sources | Direct referral |\n\nThe critical realisation: Perplexity pulls current content, generates a summary, and displays clickable source URLs alongside every answer. Users actively read and click those citations. When you see `PerplexityBot`\n\nin your logs, that's a real lead channel, not a spectator.\n\nThere is a name for the practice of structuring your content to influence how AI-generated answers represent you: **GEO** Generative Engine Optimization. Think of it as what SEO was in 2004: a real and exploitable opportunity that most people are ignoring because they're focused on the channel that already works.\n\nThe fundamental difference from traditional SEO is what you are optimising for. With SEO, the goal is a ranked link the user clicks. With GEO, the user might never see a list of links. The AI answers their question directly. Your goal shifts:\n\nHere are four tactics with an honest effort-to-impact breakdown.\n\n`llms.txt`\n\nFile\nThis is the lowest-effort tactic with the most direct signal to AI systems, and almost nobody has done it yet. An `llms.txt`\n\nfile is an emerging standard the `robots.txt`\n\nequivalent for AI crawlers, but inverted. Where `robots.txt`\n\nsets permissions, `llms.txt`\n\nsets *intent*. It tells AI systems who you are, what your expertise covers, how to reach you, and how to cite you.\n\nPlace it at your domain root: `yourdomain.com/llms.txt`\n\n.\n\nOn any static site or Next.js project, dropping a plain text file in the `public/`\n\nfolder is enough. If you want your blog post list to update automatically, a route handler at `app/llms.txt/route.ts`\n\ncan pull from your database dynamically.\n\n```\n# [Your Name] [Your Professional Title]\n\n[One or two sentences: who you are, your specialization, experience level.\nWrite this so an AI system can accurately describe you when your content\nis cited in a generated answer.]\n\n## Available For\n- [Work type: contract, consulting, fractional CTO, etc.]\n- [Client geography: remote-only, US, UK, Australia, etc.]\n- [Project type: greenfield builds, integrations, modernization, etc.]\n\n## Contact\n- Portfolio: https://[yourdomain].com\n- Hire page: https://[yourdomain].com/hire\n- Email: [you@email.com]\n- LinkedIn: https://linkedin.com/in/[handle]\n\n## Technical Expertise\n- [Specific technology, framework, or language be precise]\n- [Specific vendor API or platform you regularly work with]\n- [Domain or industry knowledge name the niche, not the category]\n\n## Blog\nTechnical guides on [your topic areas]. Updated [frequency].\nAll content is original, written by [Your Name].\n\n## Preferred Citation Format\n\"[Your Name], [Your Title] at [yourdomain].com\"\n```\n\nThe most important section to get right is Technical Expertise.Generic descriptions \"web development\", \"cloud architecture\" do not differentiate you from thousands of other sites. Specific ones naming actual vendor APIs, precise frameworks, or the exact niche you work in tell an AI exactly when your content is the relevant source for a specific query.\n\nWhen AI systems process your content, they do not copy it verbatim they extract and rephrase the key points. Most developers write in a neutral, tutorial voice that strips their identity completely out of the summary.\n\nHere is what the difference looks like in practice. Same post, two different openings:\n\n**❌ Without GEO thinking:**\n\nIn this tutorial, we will set up OAuth 2.0 PKCE flow with the Clio API in a .NET backend...\n\n**✅ With GEO thinking:**\n\nI am a freelance .NET contractor who has built several Clio integrations for law firms. In this guide, I walk through the OAuth 2.0 PKCE setup that has held up best across multiple production deployments...\n\nWhen an AI summarises the second version, your identity travels with the answer:\n\n\"According to a .NET contractor specialising in Clio integrations at [your site]...\"\n\nThe same principle applies to the closing of every post. A specific, service-oriented CTA at the end gives AI systems something worth surfacing:\n\nIf you are building on top of Clio or Lawmatics and need this implemented in .NET, I take on contract engagements project estimates available at [link].\n\nThat sentence, if included in an AI-generated answer, is a lead-generation asset running inside someone else's conversation. **Write it on every post.**\n\nAI systems cite sources that appear authoritative on a topic. One of the strongest signals of authority is being the *only* credible, detailed source on a very specific subject.\n\nIf you are the only developer who has written five interconnected, technically deep posts about building .NET backends on top of Clio's API with working code, architecture notes, and deployment gotchas from real projects you become the default citation every time an AI answers a question in that space. Not because of domain authority or backlink counts. Because there is simply no competition.\n\nHere is what the right level of specificity actually looks like:\n\n| ❌ Too Broad | ✅ Right Level |\n|---|---|\n| ASP.NET Core tutorial | Syncing Clio contacts via .NET webhook handlers |\n| API integration guide | Multi-tenant Blazor Server architecture for legal SaaS |\n\nPublish 4–6 posts that link to each other and collectively answer every reasonable question in that space. At the right specificity, you can realistically become the go-to source in both traditional search and AI-generated answers within a few months of consistent publishing.\n\nPerplexity deserves its own section because it operates fundamentally differently from every other AI platform. ChatGPT and Claude answer from training data and give no source credit your content informs their answer but your name does not appear. Perplexity pulls live search results, generates a summary, and shows sources with visible, clickable links. The referral traffic it sends is real, measurable, and growing.\n\nOptimising specifically for Perplexity comes down to three things:\n\n`H2`\n\nand `H3`\n\nheadings directly in its answer UI`FAQPage`\n\nschema markup; Perplexity favours FAQ-formatted content`Article`\n\nand `Person`\n\nschema markupAdd this inside a `<script type=\"application/ld+json\">`\n\ntag in your blog post's `<head>`\n\n:\n\n```\n{\n  \"@context\": \"https://schema.org\",\n  \"@type\": \"Article\",\n  \"headline\": \"Your Post Title Here\",\n  \"datePublished\": \"2026-06-08\",\n  \"dateModified\": \"2026-06-08\",\n  \"author\": {\n    \"@type\": \"Person\",\n    \"name\": \"Your Full Name\",\n    \"url\": \"https://yourdomain.com\",\n    \"jobTitle\": \"Your Professional Title\",\n    \"sameAs\": [\n      \"https://linkedin.com/in/yourhandle\",\n      \"https://github.com/yourhandle\"\n    ]\n  },\n  \"publisher\": {\n    \"@type\": \"Person\",\n    \"name\": \"Your Full Name\",\n    \"url\": \"https://yourdomain.com\"\n  }\n}\n```\n\nThe\n\n`sameAs`\n\narray tells search engines and AI systems that your LinkedIn, GitHub, and portfolio are all the same person. This strengthens your entity profile across the web and helps attribution travel with your content across platforms.\n\nAll four tactics compound over time, but they are not equal in setup effort. Here is how I would actually sequence them:\n\n| Priority | Tactic | Effort | Impact | Timeline |\n|---|---|---|---|---|\n| 1 | Create `llms.txt` file |\nLow | Medium | This week |\n| 2 | Embed your name and niche into content | Low | High | This week |\n| 3 | Add JSON-LD schema markup to every post | Medium | Medium | 2–4 weeks |\n| 4 | Build a niche content cluster | High | High (compounds) | 3–6 months |\n\nThe window for early advantage here is genuinely still open. Most technical niches have no intentional GEO strategy at all. Content that gets indexed and cited by AI systems over the next 12–18 months is likely to stay prominent for years the same way early SEO content still ranks for certain terms despite its age.\n\nThe bots are reading your site either way. The only variable is whether the answers they produce include your name.\n\n*If you found this useful, I also write about .NET, Blazor, legal tech integrations, and building a freelance practice as a specialist developer. You can see my work and availability on my hire page →*", "url": "https://wpnews.pro/news/ai-bots-are-reading-your-site-here-s-how-to-make-them-sell-you", "canonical_source": "https://dev.to/kathan555/ai-bots-are-reading-your-site-heres-how-to-make-them-sell-you-2ode", "published_at": "2026-06-17 09:02:04+00:00", "updated_at": "2026-06-17 09:21:27.660949+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-products", "developer-tools"], "entities": ["GPTBot", "ClaudeBot", "PerplexityBot", "CCBot", "Googlebot", "Bingbot", "YouBot"], "alternates": {"html": "https://wpnews.pro/news/ai-bots-are-reading-your-site-here-s-how-to-make-them-sell-you", "markdown": "https://wpnews.pro/news/ai-bots-are-reading-your-site-here-s-how-to-make-them-sell-you.md", "text": "https://wpnews.pro/news/ai-bots-are-reading-your-site-here-s-how-to-make-them-sell-you.txt", "jsonld": "https://wpnews.pro/news/ai-bots-are-reading-your-site-here-s-how-to-make-them-sell-you.jsonld"}}