# doceval — eval harness for LLM document extraction pipelines

> Source: <https://dev.to/dave8172/show-hn-doceval-eval-harness-for-llm-document-extraction-pipelines-3gd7>
> Published: 2026-06-16 12:29:37+00:00

I kept seeing the same gap: people ship LLM-based document extractors (invoices, receipts, forms) with no systematic way to know how accurate they actually are. So I built doceval — point it at your extractor function + a labeled dataset and get back field-level accuracy, a failure taxonomy (missed_field / hallucination / wrong_format / wrong_value), and optional per-document cost tracking.

Works with any extractor (Claude, GPT, regex, rules) and any document schema. One JSON label file per document, one Python function, one CLI command.

Includes a working 20-document invoice example with a Claude Haiku extractor so you can run it immediately.
