# llms.txt — Making Your Site Navigable by Agents

> Source: <https://dev.to/amitrix/llmstxt-making-your-site-navigable-by-agents-4dd0>
> Published: 2026-06-06 07:05:54+00:00

`llms.txt`

by late 2025 — still under 1% of active websites, meaning early adoption is a differentiator; Cloudflare, Anthropic, Stripe, and Vercel all ship one.`llms.txt`

gives agents a curated structured map, `llms-full.txt`

gives them the entire corpus in one HTTP GET.`## Permissions`

block is the clearest machine-readable signal a site owner can give about AI inference-time use; `robots.txt`

was designed for crawling, not for this.Your site has two audiences now. Browsers render your HTML for humans. Agents need something else entirely.

When an AI agent visits a website, it doesn't see your carefully designed layout, your navigation bar, or your footer links. It sees a wall of text extracted from DOM elements — ads, cookie banners, navigation chrome, JavaScript-rendered content that may not even load. The conversion from HTML to useful context is lossy, expensive, and unreliable.

[llms.txt](https://llmstxt.org/) fixes this. It's a Markdown file at your site's root that gives AI systems a curated, structured map of what's here and where to find it. Think of it as `robots.txt`

for the inference era — except instead of telling crawlers what to avoid, it tells agents what to consume and how.

HTML was designed in 1993 for documents rendered in browsers. Thirty years of evolution added navigation menus, advertising slots, JavaScript bundles, cookie consent modals, embedded tracking, and layout frameworks. All of it optimized for human visual processing.

An AI agent processing that same page has to:

Every step is lossy. Navigation links get mixed with content links. Code blocks lose formatting. Tables collapse. Metadata vanishes. The agent gets a degraded version of what you wrote.

This is the interface mismatch: your content is structured and valuable, but the delivery format (HTML) was never designed for machine consumption at inference time.

[Jeremy Howard proposed llms.txt](https://llmstxt.org/) in September 2024. The idea is simple: put a Markdown file at `/llms.txt`

that serves as a curated index of your site for AI systems.

The format:

```
# Site Name

> One-line description of what this site is.

Optional context paragraphs — key information an agent needs
to understand everything else.

## Section Name

- [Page Title](https://url): Brief description of what's there

## Optional

- [Less Important Page](https://url): Can be skipped for shorter context
```

That's the entire spec. An H1 with the site name. A blockquote summary. Optional context. Then sections of links with descriptions. The `## Optional`

section has special meaning — agents can skip it when context is tight.

The adoption has been rapid. [Over 844,000 sites](https://www.soar.sh/blog/llms-txt-file-brand-guide) had implemented llms.txt by late 2025. That's still less than 1% of all active websites — which means being in that group is currently a differentiator, not table stakes. Cloudflare, Anthropic, Stripe, Zapier, and Vercel all ship one. It's not a niche experiment — it's becoming baseline infrastructure for any site that expects AI systems to interact with its content.

llms.txt is the map. `llms-full.txt`

is the territory.

Where llms.txt provides navigation and structure, llms-full.txt contains the actual complete content — every page, concatenated into a single Markdown file. One HTTP request, one response, full corpus.

The [relationship between the two](https://llms-txt.io/blog/llms-txt-and-llms-full-txt) is complementary:

| llms.txt | llms-full.txt | |
|---|---|---|
Purpose |
Navigation and structure | Complete content |
Size |
Small (< 10KB) | Large (can be multiple MB) |
Use case |
Quick orientation, selective retrieval | Full-context assistance, RAG ingestion |
Analogy |
Table of contents | The entire book |

Different AI tools use them differently. A chat assistant might read `llms.txt`

to understand what's available, then fetch specific linked pages as needed. A development environment like Cursor or Claude Code might prefer `llms-full.txt`

— load the entire corpus into context and work with complete knowledge. A RAG pipeline might ingest `llms-full.txt`

wholesale and chunk it for semantic search.

The dual-file approach means you serve both patterns: selective retrieval for context-constrained systems, and full ingestion for systems with room.

This site runs on Astro — a static site generator that compiles everything to HTML at build time. The `llms.txt`

and `llms-full.txt`

files are generated as part of the same build process.

** /llms.txt** is hand-authored. It's a curated index — I decide what sections to highlight, what descriptions to write, what the site's one-line summary is. This is editorial work, not automation. It looks like this:

```
# Artificial Curiosity Labs

> Writing about AI-native work, agent infrastructure, and what happens
> when curiosity meets technology.

## Content

- [Blog](https://artificialcuriositylabs.dev/posts): All posts
- [About](https://artificialcuriositylabs.dev/about): Who I am
- [Full text for LLMs](https://artificialcuriositylabs.dev/llms-full.txt): Complete content
- [RSS Feed](https://artificialcuriositylabs.dev/rss.xml): Subscribe

## Topics

- AI-native work as an operating model
- AWS Bedrock — AgentCore, Claude models, inference patterns
- Claude Code — setup, ops, MCP server configuration
- Multi-agent architectures and patterns

## Permissions

This site grants permission to AI systems to index, retrieve,
and cite all content, provided attribution is given.
```

** /llms-full.txt** is auto-generated. A build script reads every

`.md`

file from the blog content directory, preserves frontmatter (title, date, description, tags), and concatenates them with `---`

separators. The script runs in under a second as part of the normal Astro build.The generator is straightforward:

`.md`

files from `src/data/blog/`

`public/llms-full.txt`

No runtime. No API calls. No database. Just a build step that reads files and writes a file. The output is a static asset served from the CDN like any other page — cached globally, available instantly.

One detail worth calling out: the `## Permissions`

section in my llms.txt explicitly grants AI systems the right to index, retrieve, and cite the content with attribution.

This matters because the legal landscape around AI training and inference-time retrieval is unsettled. `robots.txt`

was designed for crawling, not for inference-time consumption. Some sites use `robots.txt`

to block AI crawlers entirely. Others want their content consumed but not used for training.

The permissions block in `llms.txt`

is the clearest signal a site owner can give: yes, AI systems may use this content at inference time, under these conditions. It's not legally binding in the way a license is — but it's an explicit, machine-readable statement of intent that removes ambiguity.

The payoff isn't theoretical. Here's what happens when your site has a well-structured `llms-full.txt`

:

**Any AI agent can consume your entire site in one request.** No crawling, no pagination, no JavaScript rendering. A single fetch returns clean Markdown with preserved structure, links, and metadata.

**Citation becomes trivial.** When an agent pulls from your `llms-full.txt`

, the source URL is known, the content is clean, and attribution is straightforward. Compare this to crawling HTML where the agent has to guess which page a paragraph came from.

**RAG ingestion is zero-friction.** Want your site's content in a knowledge base? Point the ingestion pipeline at `llms-full.txt`

. The content is already chunked by post (separated by `---`

), already in Markdown (the universal intermediate format for RAG), already has metadata (frontmatter).

**MCP servers can serve your content.** An MCP server that makes your site queryable by agents? Fetch `llms-full.txt`

on startup, chunk it, embed it. The plumbing that would normally require a custom scraper, HTML parser, and content extraction pipeline collapses to one HTTP GET.

**Future AI search engines index you better.** Perplexity, SearchGPT, Gemini search — these systems increasingly look for `llms.txt`

as a signal of AI-readiness. [Sites with llms.txt surface in more AI-generated answers](https://www.slashdev.io/answers/what-is-llms-txt) because the content is pre-structured for consumption.

This is the architectural insight: the same content, authored once in Markdown, serves two completely different consumption patterns through two completely different interfaces.

```
                    ┌─────────────┐
                    │  Markdown   │
                    │   Source    │
                    │  (author)   │
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │                         │
              ▼                         ▼
    ┌─────────────────┐      ┌─────────────────┐
    │   Astro Build   │      │  llms-full.txt  │
    │   → HTML/CSS    │      │   Generator     │
    │   → JS bundle   │      │   → Markdown    │
    └────────┬────────┘      └────────┬────────┘
             │                         │
             ▼                         ▼
    ┌─────────────────┐      ┌─────────────────┐
    │    Browsers      │      │   AI Agents     │
    │    (humans)      │      │   (machines)    │
    └─────────────────┘      └─────────────────┘
```

No content duplication. No sync problem. One source of truth generates both interfaces. When you write a new post, the next build produces both the HTML page and the updated `llms-full.txt`

automatically.

This is the same pattern that made APIs successful alongside web UIs — same data, different interface for different consumers. The web learned this lesson with REST APIs in the 2000s. We're learning it again now for AI consumption.

Implementing llms.txt took an afternoon. The `llms.txt`

file itself is 30 lines of hand-written Markdown. The `llms-full.txt`

generator is a short build script. The marginal cost of maintaining it is zero — it regenerates automatically every deploy.

The upside is unknown but structurally asymmetric. As AI agents become more prevalent — as more people interact with content through Claude, ChatGPT, Perplexity, Cursor, and whatever comes next — having your content pre-structured for that consumption pattern is either table stakes or a differentiator. Either way, the cost was near-zero and the decision is irreversible in the good direction.

The sites that implemented RSS early didn't know exactly how it would be used either. Some of those feeds are still being consumed twenty years later by tools the authors never imagined.

The llms.txt convention is still early. Jeremy Howard's [original spec](https://llmstxt.org/) is intentionally minimal — an H1, a blockquote, sections of links. That's it. No schema validation, no required fields beyond the title, no versioning.

Open questions worth watching:

`llms-full.txt`

include a content hash or version identifier so agents can check if it's changed since last fetch?`llms-full-section.txt`

— for selective loading?`last-updated`

timestamp in the file itself is more reliable for agents that don't inspect HTTP headers.For now, the baseline is clear: put a `llms.txt`

at your root, generate a `llms-full.txt`

at build time, add a permissions block, and make your content available to the next generation of consumers. The cost is an afternoon. The upside compounds.
