The AI-native document format The Joint Development Foundation and LF AI & Data announced DocLang, a new open standard for machine-readable documents designed to replace PDF and DOCX in AI pipelines. DocLang encodes semantic tags, bounding boxes, and reading order natively, eliminating parsing errors and reducing token overhead for LLMs. The standard aims to improve AI accuracy by providing structured, trustworthy input for automated workflows. PDF was built for print. DOCX was built for editors. DocLang is built for what comes next — a machine-readable document standard your models can actually trust. The world's knowledge lives in formats designed for rendering, not understanding. Markdown was built for readers. HTML for browsers. LaTeX for typesetting. PDF for print. None were built for machines. Modern AI pipelines assume clean, structured input. Real-world documents — contracts, invoices, research papers, regulatory filings — are none of those things. Parsers guess at reading order. Tables become flat text. Figures vanish. Metadata is stripped. The result: your model's accuracy is bottlenecked by document quality, not model quality. You spend more engineering time wrangling pre-processing than building the product. parse "quarterly report.pdf" DocLang defines a structured, machine-readable format for documents of any type. Not a converter. Not an API. A standard — like JSON for data, like HTML for the web — that any tool can implement and any pipeline can consume. Every component carries a semantic tag, bounding box coordinates, and reading order — natively encoded in a format LLM tokenizers can parse without translation overhead. A table encodes its full grid structure via OTSL. A heading carries its level and page position. Your model doesn't have to guess. Governance metadata — PII flags, RAG permissions, training constraints — lives inside