{"slug": "show-hn-review-oriented-docx-extraction-toolkit-for-rust", "title": "Show HN: Review-oriented DOCX extraction toolkit for Rust", "summary": "A Rust developer released `docx-review`, an open-source toolkit that extracts structured JSON from DOCX files, including tracked changes, comments, headers, and footnotes. The headless tool is designed for automation, review workflows, and AI pipelines that require more than plain text extraction from Word documents.", "body_md": "`docx-review`\n\nis a review-oriented DOCX extraction toolkit for Rust.\n\nIt reads `.docx`\n\nfiles directly from OOXML and produces structured JSON for:\n\n- document blocks\n- tracked changes\n- comments and comment threads\n- text anchors\n- headers, footers, footnotes, and endnotes\n\nThe project is designed as headless infrastructure for automation, review workflows, AI pipelines, and downstream tools that need more than plain text.\n\n`docx-review`\n\ncurrently supports:\n\n- paragraphs\n- table cells as flat blocks\n- tracked changes:\n- insert\n- delete\n- replacement\n- move\n- format change\n\n- comments\n- comment anchors and anchored text\n- comment threading and resolved state from\n`commentsExtended.xml`\n\n- footnotes and endnotes\n- headers and footers\n- list item detection by nesting level\n- text spans with tracked-change markers\n\n`crates/core`\n\n`docx-review-core`\n\n- extraction library and normalized data model\n\n`crates/cli`\n\n`docx-review`\n\n- command-line interface\n\nInstall the CLI from crates.io:\n\n```\ncargo install docx-review-cli\n```\n\nThis installs the `docx-review`\n\ncommand.\n\nBasic extraction:\n\n```\ndocx-review extract document.docx\n```\n\nPretty JSON:\n\n```\ndocx-review extract document.docx --pretty\n```\n\nTracked changes only:\n\n```\ndocx-review extract document.docx --only track-changes --pretty\n```\n\nComments only:\n\n```\ndocx-review extract document.docx --only comments --pretty\n```\n\nRead from stdin:\n\n```\ncat document.docx | docx-review extract -\n```\n\nJSONL output:\n\n```\ndocx-review extract document.docx --format jsonl\n```\n\nNotes:\n\n`--format jsonl`\n\nwith no`--only`\n\nemits one block JSON object per line.`--only comments --format jsonl`\n\nemits one comment per line.`--only track-changes --format jsonl`\n\nemits one tracked change per line.\n\nTrack changes modes:\n\n```\ndocx-review extract document.docx --mode paired\ndocx-review extract document.docx --mode raw\ndocx-review extract document.docx --mode both\n```\n\nUseful flags:\n\n`--no-comments`\n\n`--no-text-spans`\n\n`--include-raw-ids`\n\n`-v`\n\n,`-vv`\n\nAdd the crate:\n\n```\n[dependencies]\ndocx-review-core = \"0.1\"\nphp\nuse docx_review_core::extract_from_path;\n\nfn main() -> Result<(), docx_review_core::Error> {\n    let document = extract_from_path(\"document.docx\")?;\n\n    println!(\"blocks: {}\", document.blocks.len());\n    println!(\"tracked changes: {}\", document.tracked_changes.len());\n    println!(\"comments: {}\", document.comments.len());\n\n    Ok(())\n}\n```\n\nWith options:\n\n```\nuse docx_review_core::{extract_from_path_with_opts, ExtractOptions, TrackChangesMode};\n\nfn main() -> Result<(), docx_review_core::Error> {\n    let mut opts = ExtractOptions::default();\n    opts.track_changes_mode = TrackChangesMode::Both;\n    opts.include_raw_ids = true;\n\n    let document = extract_from_path_with_opts(\"document.docx\", opts)?;\n    println!(\"raw changes: {}\", document.raw_changes.len());\n\n    Ok(())\n}\n```\n\nAt a high level, extraction returns:\n\n`Document.blocks`\n\n- the normalized textual structure of the document\n\n`Document.tracked_changes`\n\n- review-oriented change records\n\n`Document.comments`\n\n- comments, anchors, replies, and resolved state\n\n`Document.raw_changes`\n\n- optional raw tracked changes when\n`TrackChangesMode::Both`\n\nis used\n\n- optional raw tracked changes when\n\n`blocks`\n\nare the main content surface. Comments and tracked changes are linked back to blocks by id.\n\nThe current implementation is focused on review semantics and structural extraction.\n\nDesigned for:\n\n- review metadata extraction from real Word documents\n- tracked changes and comment workflows\n- structural stories outside\n`document.xml`\n\nNot the current focus:\n\n- editing\n- image extraction\n- full numbering style reconstruction\n\nRun the CLI locally:\n\n```\ncargo run -p docx-review-cli -- extract path/to/document.docx --pretty\n```\n\n", "url": "https://wpnews.pro/news/show-hn-review-oriented-docx-extraction-toolkit-for-rust", "canonical_source": "https://github.com/artemnistuley/docx-review", "published_at": "2026-06-04 19:30:48+00:00", "updated_at": "2026-06-04 19:50:19.143646+00:00", "lang": "en", "topics": ["ai-tools", "ai-infrastructure"], "entities": ["docx-review", "Rust", "OOXML", "crates.io"], "alternates": {"html": "https://wpnews.pro/news/show-hn-review-oriented-docx-extraction-toolkit-for-rust", "markdown": "https://wpnews.pro/news/show-hn-review-oriented-docx-extraction-toolkit-for-rust.md", "text": "https://wpnews.pro/news/show-hn-review-oriented-docx-extraction-toolkit-for-rust.txt", "jsonld": "https://wpnews.pro/news/show-hn-review-oriented-docx-extraction-toolkit-for-rust.jsonld"}}