Doclang-Project/Doclang The DocLang Project has released DocLang, an AI-native markup format for unstructured content that maps to LLM tokens while preserving structure, semantics, layout, and geometry. The repository hosts the normative specification and reference validator, available via PyPI, with the project supported by the LF AI & Data Foundation under the Apache License 2.0. This standard aims to provide a single, unambiguous representation for documents and images used with large language models and vision-language models. DocLang is the AI-native markup format for unstructured content — including documents, images, and more. It maps cleanly to LLM tokens while preserving structure, semantics, layout, and geometry in a single, unambiguous representation. This repository is the home of the normative specification and the reference validator for DocLang. If you build with LLMs and VLMs on real-world content, this is where the standard lives. The source of the specification is available in spec.md https://github.com/doclang-project/doclang/blob/main/spec.md and exports to different formats can be found in the exports/ https://github.com/doclang-project/doclang/tree/main/exports directory. You can install the validator from PyPI: pip install doclang You can then validate a DocLang document as follows: doclang validate -n my document.dclg.xml For more details, see the doclang/README.md https://github.com/doclang-project/doclang/blob/main/doclang/README.md . If you use DocLang in academic or technical work, please cite the specification: @misc{doclang 2026, title = {DocLang: Universal AI Document Format}, author = {{DocLang Project}}, year = {2026}, version = {main}, howpublished = {\url{https://github.com/doclang-project/doclang}}, } To work on this repository — setup, tests, reference generation, releases — see CONTRIBUTING.md https://github.com/doclang-project/doclang/blob/main/CONTRIBUTING.md . DocLang is developed in the open and supported by the LF AI & Data Foundation https://lfaidata.foundation/projects/ . Learn more about the project at doclang-project https://github.com/doclang-project . DocLang is licensed under the Apache License 2.0. See LICENSE https://github.com/doclang-project/doclang/blob/main/LICENSE for details.