cd /news/ai-tools/show-hn-review-oriented-docx-extract… · home topics ai-tools article
[ARTICLE · art-21883] src=github.com pub= topic=ai-tools verified=true sentiment=↑ positive

Show HN: Review-oriented DOCX extraction toolkit for Rust

A Rust developer released `docx-review`, an open-source toolkit that extracts structured JSON from DOCX files, including tracked changes, comments, headers, and footnotes. The headless tool is designed for automation, review workflows, and AI pipelines that require more than plain text extraction from Word documents.

read2 min publishedJun 4, 2026

docx-review

is a review-oriented DOCX extraction toolkit for Rust.

It reads .docx

files directly from OOXML and produces structured JSON for:

  • document blocks
  • tracked changes
  • comments and comment threads
  • text anchors
  • headers, footers, footnotes, and endnotes

The project is designed as headless infrastructure for automation, review workflows, AI pipelines, and downstream tools that need more than plain text.

docx-review

currently supports:

  • paragraphs

  • table cells as flat blocks

  • tracked changes:

  • insert

  • delete

  • replacement

  • move

  • format change

  • comments

  • comment anchors and anchored text

  • comment threading and resolved state from commentsExtended.xml

  • footnotes and endnotes

  • headers and footers

  • list item detection by nesting level

  • text spans with tracked-change markers

crates/core

docx-review-core

  • extraction library and normalized data model

crates/cli

docx-review

  • command-line interface

Install the CLI from crates.io:

cargo install docx-review-cli

This installs the docx-review

command.

Basic extraction:

docx-review extract document.docx

Pretty JSON:

docx-review extract document.docx --pretty

Tracked changes only:

docx-review extract document.docx --only track-changes --pretty

Comments only:

docx-review extract document.docx --only comments --pretty

Read from stdin:

cat document.docx | docx-review extract -

JSONL output:

docx-review extract document.docx --format jsonl

Notes:

--format jsonl

with no--only

emits one block JSON object per line.--only comments --format jsonl

emits one comment per line.--only track-changes --format jsonl

emits one tracked change per line.

Track changes modes:

docx-review extract document.docx --mode paired
docx-review extract document.docx --mode raw
docx-review extract document.docx --mode both

Useful flags:

--no-comments

--no-text-spans

--include-raw-ids

-v

,-vv

Add the crate:

[dependencies]
docx-review-core = "0.1"
php
use docx_review_core::extract_from_path;

fn main() -> Result<(), docx_review_core::Error> {
    let document = extract_from_path("document.docx")?;

    println!("blocks: {}", document.blocks.len());
    println!("tracked changes: {}", document.tracked_changes.len());
    println!("comments: {}", document.comments.len());

    Ok(())
}

With options:

use docx_review_core::{extract_from_path_with_opts, ExtractOptions, TrackChangesMode};

fn main() -> Result<(), docx_review_core::Error> {
    let mut opts = ExtractOptions::default();
    opts.track_changes_mode = TrackChangesMode::Both;
    opts.include_raw_ids = true;

    let document = extract_from_path_with_opts("document.docx", opts)?;
    println!("raw changes: {}", document.raw_changes.len());

    Ok(())
}

At a high level, extraction returns:

Document.blocks

  • the normalized textual structure of the document

Document.tracked_changes

  • review-oriented change records

Document.comments

  • comments, anchors, replies, and resolved state

Document.raw_changes

  • optional raw tracked changes when TrackChangesMode::Both

is used

  • optional raw tracked changes when

blocks

are the main content surface. Comments and tracked changes are linked back to blocks by id.

The current implementation is focused on review semantics and structural extraction.

Designed for:

  • review metadata extraction from real Word documents
  • tracked changes and comment workflows
  • structural stories outside document.xml

Not the current focus:

  • editing
  • image extraction
  • full numbering style reconstruction

Run the CLI locally:

cargo run -p docx-review-cli -- extract path/to/document.docx --pretty
── more in #ai-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-review-orien…] indexed:0 read:2min 2026-06-04 ·