# Doclang-Project/Doclang

> Source: <https://github.com/doclang-project/doclang>
> Published: 2026-06-12 06:26:55+00:00

** DocLang is the AI-native markup format for unstructured content** — including documents, images, and more. It maps cleanly to LLM tokens while preserving structure, semantics, layout, and geometry in a single, unambiguous representation.

This repository is the home of the normative specification and the reference validator for DocLang. If you build with LLMs and VLMs on real-world content, this is where the standard lives.

The source of the specification is available in [spec.md](https://github.com/doclang-project/doclang/blob/main/spec.md)
and exports to different formats can be found in the [exports/](https://github.com/doclang-project/doclang/tree/main/exports)
directory.

You can install the validator from PyPI:

```
pip install doclang
```

You can then validate a DocLang document as follows:

```
doclang validate -n my_document.dclg.xml
```

For more details, see the [doclang/README.md](https://github.com/doclang-project/doclang/blob/main/doclang/README.md).

If you use DocLang in academic or technical work, please cite the specification:

```
@misc{doclang_2026,
  title        = {DocLang: Universal AI Document Format},
  author       = {{DocLang Project}},
  year         = {2026},
  version      = {main},
  howpublished = {\url{https://github.com/doclang-project/doclang}},
}
```

To work on this repository — setup, tests, reference generation, releases — see [CONTRIBUTING.md](https://github.com/doclang-project/doclang/blob/main/CONTRIBUTING.md).

DocLang is developed in the open and supported by the [LF AI & Data Foundation](https://lfaidata.foundation/projects/). Learn more about the project at [doclang-project](https://github.com/doclang-project).

DocLang is licensed under the Apache License 2.0. See [LICENSE](https://github.com/doclang-project/doclang/blob/main/LICENSE) for details.
