cd /news/large-language-models/doclang-project-doclang · home topics large-language-models article
[ARTICLE · art-24895] src=github.com pub= topic=large-language-models verified=true sentiment=↑ positive

Doclang-Project/Doclang

The DocLang Project has released DocLang, an AI-native markup format for unstructured content that maps to LLM tokens while preserving structure, semantics, layout, and geometry. The repository hosts the normative specification and reference validator, available via PyPI, with the project supported by the LF AI & Data Foundation under the Apache License 2.0. This standard aims to provide a single, unambiguous representation for documents and images used with large language models and vision-language models.

read1 min publishedJun 12, 2026

** DocLang is the AI-native markup format for unstructured content** — including documents, images, and more. It maps cleanly to LLM tokens while preserving structure, semantics, layout, and geometry in a single, unambiguous representation.

This repository is the home of the normative specification and the reference validator for DocLang. If you build with LLMs and VLMs on real-world content, this is where the standard lives.

The source of the specification is available in spec.md and exports to different formats can be found in the exports/ directory.

You can install the validator from PyPI:

pip install doclang

You can then validate a DocLang document as follows:

doclang validate -n my_document.dclg.xml

For more details, see the doclang/README.md.

If you use DocLang in academic or technical work, please cite the specification:

@misc{doclang_2026,
  title        = {DocLang: Universal AI Document Format},
  author       = {{DocLang Project}},
  year         = {2026},
  version      = {main},
  howpublished = {\url{https://github.com/doclang-project/doclang}},
}

To work on this repository — setup, tests, reference generation, releases — see CONTRIBUTING.md.

DocLang is developed in the open and supported by the LF AI & Data Foundation. Learn more about the project at doclang-project.

DocLang is licensed under the Apache License 2.0. See LICENSE for details.

── more in #large-language-models 4 stories · sorted by recency
aisecurityandsafety.org · · #large-language-models
ANSSI
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/doclang-project-docl…] indexed:0 read:1min 2026-06-12 ·