llms.txt — Making Your Site Navigable by Agents

By late 2025, less than 1% of active websites had adopted the llms.txt standard, making early implementation a competitive differentiator for companies like Cloudflare, Anthropic, Stripe, and Vercel. The specification provides AI agents with a curated, structured map of a site's content via a Markdown file at the root directory, while an optional llms-full.txt delivers the entire corpus in a single HTTP request. This addresses the fundamental interface mismatch between HTML, designed for human visual processing in 1993, and the needs of modern AI systems that require lossless machine-readable content at inference time.

llms.txt by late 2025 — still under 1% of active websites, meaning early adoption is a differentiator; Cloudflare, Anthropic, Stripe, and Vercel all ship one. llms.txt gives agents a curated structured map, llms-full.txt gives them the entire corpus in one HTTP GET. Permissions block is the clearest machine-readable signal a site owner can give about AI inference-time use; robots.txt was designed for crawling, not for this.Your site has two audiences now. Browsers render your HTML for humans. Agents need something else entirely. When an AI agent visits a website, it doesn't see your carefully designed layout, your navigation bar, or your footer links. It sees a wall of text extracted from DOM elements — ads, cookie banners, navigation chrome, JavaScript-rendered content that may not even load. The conversion from HTML to useful context is lossy, expensive, and unreliable. llms.txt https://llmstxt.org/ fixes this. It's a Markdown file at your site's root that gives AI systems a curated, structured map of what's here and where to find it. Think of it as robots.txt for the inference era — except instead of telling crawlers what to avoid, it tells agents what to consume and how. HTML was designed in 1993 for documents rendered in browsers. Thirty years of evolution added navigation menus, advertising slots, JavaScript bundles, cookie consent modals, embedded tracking, and layout frameworks. All of it optimized for human visual processing. An AI agent processing that same page has to: Every step is lossy. Navigation links get mixed with content links. Code blocks lose formatting. Tables collapse. Metadata vanishes. The agent gets a degraded version of what you wrote. This is the interface mismatch: your content is structured and valuable, but the delivery format HTML was never designed for machine consumption at inference time. Jeremy Howard proposed llms.txt https://llmstxt.org/ in September 2024. The idea is simple: put a Markdown file at /llms.txt that serves as a curated index of your site for AI systems. The format: Site Name One-line description of what this site is. Optional context paragraphs — key information an agent needs to understand everything else. Section Name - Page Title https://url : Brief description of what's there Optional - Less Important Page https://url : Can be skipped for shorter context That's the entire spec. An H1 with the site name. A blockquote summary. Optional context. Then sections of links with descriptions. The Optional section has special meaning — agents can skip it when context is tight. The adoption has been rapid. Over 844,000 sites https://www.soar.sh/blog/llms-txt-file-brand-guide had implemented llms.txt by late 2025. That's still less than 1% of all active websites — which means being in that group is currently a differentiator, not table stakes. Cloudflare, Anthropic, Stripe, Zapier, and Vercel all ship one. It's not a niche experiment — it's becoming baseline infrastructure for any site that expects AI systems to interact with its content. llms.txt is the map. llms-full.txt is the territory. Where llms.txt provides navigation and structure, llms-full.txt contains the actual complete content — every page, concatenated into a single Markdown file. One HTTP request, one response, full corpus. The relationship between the two https://llms-txt.io/blog/llms-txt-and-llms-full-txt is complementary: | llms.txt | llms-full.txt | | |---|---|---| Purpose | Navigation and structure | Complete content | Size | Small < 10KB | Large can be multiple MB | Use case | Quick orientation, selective retrieval | Full-context assistance, RAG ingestion | Analogy | Table of contents | The entire book | Different AI tools use them differently. A chat assistant might read llms.txt to understand what's available, then fetch specific linked pages as needed. A development environment like Cursor or Claude Code might prefer llms-full.txt — load the entire corpus into context and work with complete knowledge. A RAG pipeline might ingest llms-full.txt wholesale and chunk it for semantic search. The dual-file approach means you serve both patterns: selective retrieval for context-constrained systems, and full ingestion for systems with room. This site runs on Astro — a static site generator that compiles everything to HTML at build time. The llms.txt and llms-full.txt files are generated as part of the same build process. /llms.txt is hand-authored. It's a curated index — I decide what sections to highlight, what descriptions to write, what the site's one-line summary is. This is editorial work, not automation. It looks like this: Artificial Curiosity Labs Writing about AI-native work, agent infrastructure, and what happens when curiosity meets technology. Content - Blog https://artificialcuriositylabs.dev/posts : All posts - About https://artificialcuriositylabs.dev/about : Who I am - Full text for LLMs https://artificialcuriositylabs.dev/llms-full.txt : Complete content - RSS Feed https://artificialcuriositylabs.dev/rss.xml : Subscribe Topics - AI-native work as an operating model - AWS Bedrock — AgentCore, Claude models, inference patterns - Claude Code — setup, ops, MCP server configuration - Multi-agent architectures and patterns Permissions This site grants permission to AI systems to index, retrieve, and cite all content, provided attribution is given. /llms-full.txt is auto-generated. A build script reads every .md file from the blog content directory, preserves frontmatter title, date, description, tags , and concatenates them with --- separators. The script runs in under a second as part of the normal Astro build.The generator is straightforward: .md files from src/data/blog/ public/llms-full.txt No runtime. No API calls. No database. Just a build step that reads files and writes a file. The output is a static asset served from the CDN like any other page — cached globally, available instantly. One detail worth calling out: the Permissions section in my llms.txt explicitly grants AI systems the right to index, retrieve, and cite the content with attribution. This matters because the legal landscape around AI training and inference-time retrieval is unsettled. robots.txt was designed for crawling, not for inference-time consumption. Some sites use robots.txt to block AI crawlers entirely. Others want their content consumed but not used for training. The permissions block in llms.txt is the clearest signal a site owner can give: yes, AI systems may use this content at inference time, under these conditions. It's not legally binding in the way a license is — but it's an explicit, machine-readable statement of intent that removes ambiguity. The payoff isn't theoretical. Here's what happens when your site has a well-structured llms-full.txt : Any AI agent can consume your entire site in one request. No crawling, no pagination, no JavaScript rendering. A single fetch returns clean Markdown with preserved structure, links, and metadata. Citation becomes trivial. When an agent pulls from your llms-full.txt , the source URL is known, the content is clean, and attribution is straightforward. Compare this to crawling HTML where the agent has to guess which page a paragraph came from. RAG ingestion is zero-friction. Want your site's content in a knowledge base? Point the ingestion pipeline at llms-full.txt . The content is already chunked by post separated by --- , already in Markdown the universal intermediate format for RAG , already has metadata frontmatter . MCP servers can serve your content. An MCP server that makes your site queryable by agents? Fetch llms-full.txt on startup, chunk it, embed it. The plumbing that would normally require a custom scraper, HTML parser, and content extraction pipeline collapses to one HTTP GET. Future AI search engines index you better. Perplexity, SearchGPT, Gemini search — these systems increasingly look for llms.txt as a signal of AI-readiness. Sites with llms.txt surface in more AI-generated answers https://www.slashdev.io/answers/what-is-llms-txt because the content is pre-structured for consumption. This is the architectural insight: the same content, authored once in Markdown, serves two completely different consumption patterns through two completely different interfaces. ┌─────────────┐ │ Markdown │ │ Source │ │ author │ └──────┬──────┘ │ ┌────────────┼────────────┐ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Astro Build │ │ llms-full.txt │ │ → HTML/CSS │ │ Generator │ │ → JS bundle │ │ → Markdown │ └────────┬────────┘ └────────┬────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────┐ │ Browsers │ │ AI Agents │ │ humans │ │ machines │ └─────────────────┘ └─────────────────┘ No content duplication. No sync problem. One source of truth generates both interfaces. When you write a new post, the next build produces both the HTML page and the updated llms-full.txt automatically. This is the same pattern that made APIs successful alongside web UIs — same data, different interface for different consumers. The web learned this lesson with REST APIs in the 2000s. We're learning it again now for AI consumption. Implementing llms.txt took an afternoon. The llms.txt file itself is 30 lines of hand-written Markdown. The llms-full.txt generator is a short build script. The marginal cost of maintaining it is zero — it regenerates automatically every deploy. The upside is unknown but structurally asymmetric. As AI agents become more prevalent — as more people interact with content through Claude, ChatGPT, Perplexity, Cursor, and whatever comes next — having your content pre-structured for that consumption pattern is either table stakes or a differentiator. Either way, the cost was near-zero and the decision is irreversible in the good direction. The sites that implemented RSS early didn't know exactly how it would be used either. Some of those feeds are still being consumed twenty years later by tools the authors never imagined. The llms.txt convention is still early. Jeremy Howard's original spec https://llmstxt.org/ is intentionally minimal — an H1, a blockquote, sections of links. That's it. No schema validation, no required fields beyond the title, no versioning. Open questions worth watching: llms-full.txt include a content hash or version identifier so agents can check if it's changed since last fetch? llms-full-section.txt — for selective loading? last-updated timestamp in the file itself is more reliable for agents that don't inspect HTTP headers.For now, the baseline is clear: put a llms.txt at your root, generate a llms-full.txt at build time, add a permissions block, and make your content available to the next generation of consumers. The cost is an afternoon. The upside compounds.