` in a product description ends the JSON-LD block and opens an XSS vector [Source 7](#source-7). ## How It Works JSON-LD embedded in the initial HTML response is the cheapest contract you can offer an extractor. Google's own guidance treats it as the recommended structured-data form precisely because it sidesteps JavaScript hydration delays that LLM-based crawlers handle poorly [Source 1](#source-1). Crawlers like GPTBot can parse schema directly out of HTML, and the trend over the last three years is unambiguous: WebSite, Organization, and Product schemas keep climbing while microdata declines [Source 3](#source-3). Inner pages remain undercovered — JSON-LD sits at ~39% on desktop versus 43% on home pages — and that gap is where most teams leak ambiguity to agents [Source 1](#source-1). The contract framing matters because schema-on-write systems give the *reader* a stable surface to plan against, the same lesson Netflix learned with NMDB: a validated schema acts as an API contract that decouples writers from the many applications consuming the data [Source 2](#source-2). Without it, every consumer reimplements schema-on-read parsing logic with its own quirks [Source 5](#source-5). For an LLM agent, "schema-on-read" means the model invents a structure during inference — exactly the imagination problem Anthropic's tool-design guidance warns against ("if your schema just says user ID is a string, the agent might pass `John` , or `user 123` , or literally anything") [Source 10](#source-10). WebMCP and similar emerging standards push this further: sites expose declarative tools whose schemas the agent calls directly, replacing thousands of vision tokens or DOM-parsing tokens with a single typed call [Source 9](#source-9). JSON-LD is the lowest-rung version of that same idea — a passive, indexable contract — and the structured-output APIs every major model now ships (OpenAI's guaranteed JSON [Source 6](#source-6), Anthropic's `output_config.format` [Source 12](#source-12), Pydantic AI [Source 11](#source-11), Outlines [Source 13](#source-13)) mean the consumer side is fully aligned with typed I/O. The agent expects typed inputs from your page and produces typed outputs from your tools. Untyped HTML in the middle is the only mismatched link. ``` Page render Indexed contract Agent runtime ┌────────────┐ JSON-LD ┌─────────────────┐ query ┌──────────────┐ │ Server │ ───────────► │ Crawler / │ ──────► │ LLM extractor│ │ (RSC/SSR) │ in initial │ vector store / │ typed │ + tool call │ │ │ HTML │ knowledge graph │ facts │ (structured │ └────────────┘ └─────────────────┘ ◄────── │ output) │ ▲ ▲ └──────┬───────┘ │ schema-dts types │ schema.org vocab │ └─── compile-time check ─────┴─── runtime validation ────┘ ``` ## When It Breaks | Condition | What happens | Use instead | |---|---|---| | Schema injected post-hydration via client JS | LLM crawlers and many bots miss it; only ~2% of sites use JS-injected schema for a reason | `layout` /`page` server components so it ships in initial HTML [Source 7](#source-7)`WebSite` markup[Source 3](#source-3)`WebSite` /`Organization` only on home and one canonical About page [Source 1](#source-1)`<` or `` [Source 7](#source-7)`JSON.stringify(jsonLd).replace(/` is the only signal extractors get [Source 8](#source-8)[Source 8](#source-8)[Source 4](#source-4)[Source 4](#source-4)## CEMENT Brick If your public pages ship meaning only in rendered prose and DOM, then AI agents — answer engines, shopping bots, research crawlers — will reconstruct that meaning probabilistically at thousands of tokens per page and disagree with each other about what your product, article, or organization actually *is*, because the consumer side of the web has already moved to typed I/O (JSON schemas in tool calls, structured outputs in model APIs, knowledge graphs as agent context) and an untyped HTML middle is now the weakest contract in the chain. ## Sources - Engineering Docs [implementing-the-netflix-media-database-53b5a840b42a](https://netflixtechblog.com/implementing-the-netflix-media-database-53b5a840b42a)- Engineering Docs - Engineering Docs - Engineering Docs [Agentic Info Extraction with Structured Outputs](https://www.youtube.com/watch?v=hpMCvfIIM_A)[How to implement JSON-LD in your Next.js application](https://nextjs.org/docs/app/guides/json-ld)[loading.js](https://nextjs.org/docs/app/api-reference/file-conventions/loading)[The Rise of WebMCP](https://www.youtube.com/watch?v=35oWt7u2b-g)[The 7 Skills You Need to Build AI Agents](https://www.youtube.com/watch?v=mtiOK2QG9Q0)[PydanticAI - The NEW Agent Builder on the Block](https://www.youtube.com/watch?v=UnH7S5044GA)- Engineering Docs [A new short course created with DotTxt is available now](https://www.youtube.com/watch?v=qUt0-B8s1vE)