{"slug": "per-page-markdown-source-endpoints", "title": "Per-page Markdown source endpoints", "summary": "A new specification proposes that documentation sites expose raw Markdown source at predictable URLs via a .md suffix or content negotiation, enabling AI agents and tools to fetch lossless, smaller payloads without parsing HTML. The approach, already adopted by Anthropic and Stripe, includes conventions for URL suffixes, Accept headers, and metadata hints like X-Markdown-Tokens to signal agent-readiness.", "body_md": "# Per-page Markdown source endpoints\n\nExpose every documentation page's raw Markdown source at a predictable URL — via a .md suffix on the canonical URL, content negotiation, or both. Agents pull source instead of parsing HTML.\n\n## What it is\n\nA per-page Markdown source endpoint exposes the original, unrendered Markdown of a content page at a predictable URL. An agent — an LLM, a documentation index, a CLI tool — fetches the Markdown instead of the HTML and gets the same content without the lossy round trip through DOM parsing, ad scaffolding, navigation chrome, and JavaScript hydration.\n\nTwo conventions are emerging, and the best sites do both.\n\n**1. URL suffix.** Append `.md`\n\nto the page path.\n\n```\nGET /docs/getting-started/        HTTP/1.1\nGET /docs/getting-started.md      HTTP/1.1\n```\n\nThe first returns HTML. The second returns the raw Markdown source with a `text/markdown`\n\ncontent type.\n\n**2. Content negotiation.** Send `Accept: text/markdown`\n\nto the canonical (HTML) URL and get Markdown back.\n\n```\nGET /docs/getting-started/        HTTP/1.1\nAccept: text/markdown\nHTTP/1.1 200 OK\nContent-Type: text/markdown; charset=utf-8\nContent-Location: /docs/getting-started.md\nVary: Accept\n```\n\nThe `Content-Location`\n\nheader tells the client the canonical Markdown URL; `Vary: Accept`\n\ntells caches the response depends on the request’s `Accept`\n\nheader.\n\n## Why it matters\n\n**Lossless source.** HTML extraction loses heading semantics, code-block language hints, callout syntax, and inline metadata. Markdown gives the agent exactly what the author wrote.**Smaller payloads.** A typical Markdown page is 5–20× smaller than its rendered HTML once you remove styling, scripts, and shared chrome.**No JavaScript runtime.** Crawlers that don’t run JS still get full content.**Aligned with**`llms.txt`\n\n.[llms.txt](/spec/agent-readiness/llms-txt/)and[llms-full.txt](/spec/agent-readiness/llms-full-txt/)point at pages —`.md`\n\nmakes the pointed-to representation directly usable.**Stable signal of intent.** Sites that expose`.md`\n\nare advertising “we want to be readable by agents,” which is itself a useful discovery signal.\n\nAnthropic’s documentation, Stripe’s, and a growing number of others ship the `.md`\n\nsuffix. Treat the canonical URL with `Accept: text/markdown`\n\nas the next layer up — same content, the URL stays canonical, caches behave correctly.\n\n## How to implement\n\n**Ship the suffix at minimum.** For every Markdown-sourced page, serve the same Markdown at the URL with `.md`\n\nappended.\n\n```\nHTTP/1.1 200 OK\nContent-Type: text/markdown; charset=utf-8\nCache-Control: public, max-age=3600\nX-Markdown-Tokens: 1240\n```\n\nInclude a YAML frontmatter block with the page’s metadata so agents get structured context alongside the prose: `title`\n\n, `url`\n\n(the canonical HTML URL), `updated`\n\n, `sources`\n\n, `licence`\n\n. Don’t ship implementation details an agent doesn’t need (build timestamps, internal IDs).\n\n**Hint the size.** Add an `X-Markdown-Tokens`\n\nresponse header carrying a rough token-count estimate for the body (any reasonable tokeniser — `tiktoken`\n\n’s `cl100k_base`\n\nis a fine default). The header is not a registered standard, but a growing number of agent-friendly sites ship it because it lets a caller decide whether to inline the page into a prompt, summarise it first, or skip it for a smaller one — without downloading the body. Recompute it whenever the Markdown changes; do not invent the number.\n\n**Advertise it in <head>.** Add a discovery link to the HTML version so crawlers and tools find the Markdown without guessing the URL pattern.\n\n```\n<link rel=\"alternate\" type=\"text/markdown\"\n      href=\"https://example.com/docs/getting-started.md\"\n      title=\"Getting started — Markdown source\">\n```\n\n**Add content negotiation if your edge supports it.** Cloudflare Pages Functions, Workers, Vercel Edge Middleware, Netlify Edge Functions, Fastly Compute, and Nginx can all branch on `Accept`\n\n. The middleware:\n\n- Matches the page URL pattern.\n- If\n`Accept`\n\ncontains`text/markdown`\n\n, fetches the`.md`\n\nasset internally and returns its body with`Content-Type: text/markdown`\n\n,`Content-Location: /…/page.md`\n\n, and`Vary: Accept`\n\n. - Otherwise serves HTML, but still appends\n`Vary: Accept`\n\nso caches don’t conflate the two representations of the same URL.\n\n**Mention it in llms.txt.** Document both modes near the top of the file so agents can opt in on first read.\n\n**Extend it to aggregate views, not just individual pages.** The same `Accept: text/markdown`\n\nnegotiation works for any HTML view backed by Markdown. This site ships all three modes on every spec page and also negotiates two aggregate URLs: `/`\n\nreturns [llms.txt](/spec/agent-readiness/llms-txt/), and [ /checklist/](/checklist/) returns a copy-and-paste\n\n[Markdown task list](/checklist.md).\n\n## Common mistakes\n\n- Shipping the suffix but forgetting\n`Content-Type: text/markdown`\n\n. Browsers will then download the file instead of displaying it, and agents may misclassify it. - Doing content negotiation without\n`Vary: Accept`\n\n. Caches will return whichever representation they cached first to every subsequent client, regardless of what they asked for. - Returning Markdown that still contains template includes, unresolved shortcodes, or framework-specific syntax (e.g.\n`{{ component }}`\n\n). Serve the fully-rendered Markdown, not the source-of-source. - Letting the Markdown drift from the HTML. The two should be generated from the same source at the same time, in the same build.\n- Putting the\n`.md`\n\nbehind authentication while the HTML is public. The two representations must have the same access policy.\n\n## Verification\n\n`curl -i https://example.com/docs/getting-started.md`\n\nreturns`200`\n\nwith`Content-Type: text/markdown; charset=utf-8`\n\n.`curl -i -H 'Accept: text/markdown' https://example.com/docs/getting-started/`\n\nreturns Markdown with`Content-Location`\n\nand`Vary: Accept`\n\n.- The HTML response to the same URL also carries\n`Vary: Accept`\n\n. - View source on any page — there is a\n`<link rel=\"alternate\" type=\"text/markdown\" …>`\n\nin`<head>`\n\n. - The\n`.md`\n\nbody contains the same headings, paragraphs, and code blocks as the rendered HTML, with frontmatter at the top.", "url": "https://wpnews.pro/news/per-page-markdown-source-endpoints", "canonical_source": "https://specification.website/spec/agent-readiness/markdown-source-endpoints/", "published_at": "2026-07-01 00:00:00+00:00", "updated_at": "2026-07-01 09:24:27.246003+00:00", "lang": "en", "topics": ["developer-tools", "ai-agents", "large-language-models"], "entities": ["Anthropic", "Stripe", "Cloudflare", "Vercel", "Netlify", "Fastly", "Nginx"], "alternates": {"html": "https://wpnews.pro/news/per-page-markdown-source-endpoints", "markdown": "https://wpnews.pro/news/per-page-markdown-source-endpoints.md", "text": "https://wpnews.pro/news/per-page-markdown-source-endpoints.txt", "jsonld": "https://wpnews.pro/news/per-page-markdown-source-endpoints.jsonld"}}