# Show HN: Review-oriented DOCX extraction toolkit for Rust

> Source: <https://github.com/artemnistuley/docx-review>
> Published: 2026-06-04 19:30:48+00:00

`docx-review`

is a review-oriented DOCX extraction toolkit for Rust.

It reads `.docx`

files directly from OOXML and produces structured JSON for:

- document blocks
- tracked changes
- comments and comment threads
- text anchors
- headers, footers, footnotes, and endnotes

The project is designed as headless infrastructure for automation, review workflows, AI pipelines, and downstream tools that need more than plain text.

`docx-review`

currently supports:

- paragraphs
- table cells as flat blocks
- tracked changes:
- insert
- delete
- replacement
- move
- format change

- comments
- comment anchors and anchored text
- comment threading and resolved state from
`commentsExtended.xml`

- footnotes and endnotes
- headers and footers
- list item detection by nesting level
- text spans with tracked-change markers

`crates/core`

`docx-review-core`

- extraction library and normalized data model

`crates/cli`

`docx-review`

- command-line interface

Install the CLI from crates.io:

```
cargo install docx-review-cli
```

This installs the `docx-review`

command.

Basic extraction:

```
docx-review extract document.docx
```

Pretty JSON:

```
docx-review extract document.docx --pretty
```

Tracked changes only:

```
docx-review extract document.docx --only track-changes --pretty
```

Comments only:

```
docx-review extract document.docx --only comments --pretty
```

Read from stdin:

```
cat document.docx | docx-review extract -
```

JSONL output:

```
docx-review extract document.docx --format jsonl
```

Notes:

`--format jsonl`

with no`--only`

emits one block JSON object per line.`--only comments --format jsonl`

emits one comment per line.`--only track-changes --format jsonl`

emits one tracked change per line.

Track changes modes:

```
docx-review extract document.docx --mode paired
docx-review extract document.docx --mode raw
docx-review extract document.docx --mode both
```

Useful flags:

`--no-comments`

`--no-text-spans`

`--include-raw-ids`

`-v`

,`-vv`

Add the crate:

```
[dependencies]
docx-review-core = "0.1"
php
use docx_review_core::extract_from_path;

fn main() -> Result<(), docx_review_core::Error> {
    let document = extract_from_path("document.docx")?;

    println!("blocks: {}", document.blocks.len());
    println!("tracked changes: {}", document.tracked_changes.len());
    println!("comments: {}", document.comments.len());

    Ok(())
}
```

With options:

```
use docx_review_core::{extract_from_path_with_opts, ExtractOptions, TrackChangesMode};

fn main() -> Result<(), docx_review_core::Error> {
    let mut opts = ExtractOptions::default();
    opts.track_changes_mode = TrackChangesMode::Both;
    opts.include_raw_ids = true;

    let document = extract_from_path_with_opts("document.docx", opts)?;
    println!("raw changes: {}", document.raw_changes.len());

    Ok(())
}
```

At a high level, extraction returns:

`Document.blocks`

- the normalized textual structure of the document

`Document.tracked_changes`

- review-oriented change records

`Document.comments`

- comments, anchors, replies, and resolved state

`Document.raw_changes`

- optional raw tracked changes when
`TrackChangesMode::Both`

is used

- optional raw tracked changes when

`blocks`

are the main content surface. Comments and tracked changes are linked back to blocks by id.

The current implementation is focused on review semantics and structural extraction.

Designed for:

- review metadata extraction from real Word documents
- tracked changes and comment workflows
- structural stories outside
`document.xml`

Not the current focus:

- editing
- image extraction
- full numbering style reconstruction

Run the CLI locally:

```
cargo run -p docx-review-cli -- extract path/to/document.docx --pretty
```


