AI assistants like ChatGPT, Claude, and Perplexity are increasingly crawling the web for context. But most websites aren't optimised for AI readability — they're built for human browsers with complex HTML, JavaScript navigation, and boilerplate-heavy layouts.
The ** llms.txt standard** is changing this. It's a simple convention: place a
llms.txt
file at your site root that gives AI systems clean, structured content they can actually understand.I built a tool that generates these files automatically for any website.
Think of it as robots.txt
but for LLMs. Three files form the standard:
llms.txt
llms-full.txt
The llms.txt Generator crawls any website using BFS (Breadth-First Search) and: Traditional SEO targets Google's crawler. But a new category is emerging: SEO for AI.
When a user asks ChatGPT "what is [your product]?, the AI searches its training data and web results. If your site has a clean llms.txt
, the AI gets structured, accurate content instead of parsing your homepage HTML.
| Parameter | Default | Description |
|---|---|---|
startUrls |
||
| required | ||
| Website URLs to crawl | ||
maxPages |
||
| 50 | Maximum pages to process | |
outputFormat |
||
| markdown | Output format (markdown/plaintext) | |
includePatterns |
||
| [] | URL patterns to include | |
excludePatterns |
[] | URL patterns to exclude | I tested it on Pydantic's documentation (docs.pydantic.dev). The crawler:
Result: 2 pages processed, full content extracted with zero boilerplate.
Live on the Apify Store: [llms.txt Generator](https://apify.com/darknezz/llms-txt-generator)
Pricing is $0.01 per page processed. Free tier covers ~50 pages.
The llms.txt standard is still emerging, but early adopters will have an advantage as AI-driven search grows. Is your website AI-readable?