cd /news/large-language-models/doceval-eval-harness-for-llm-documen… · home topics large-language-models article
[ARTICLE · art-29445] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

doceval — eval harness for LLM document extraction pipelines

A developer built doceval, an evaluation harness for LLM document extraction pipelines that provides field-level accuracy, failure taxonomy, and optional cost tracking. The tool works with any extractor and document schema, requiring only a JSON label file, a Python function, and a CLI command. It includes a working 20-document invoice example with a Claude Haiku extractor.

read1 min views1 publishedJun 16, 2026

I kept seeing the same gap: people ship LLM-based document extractors (invoices, receipts, forms) with no systematic way to know how accurate they actually are. So I built doceval — point it at your extractor function + a labeled dataset and get back field-level accuracy, a failure taxonomy (missed_field / hallucination / wrong_format / wrong_value), and optional per-document cost tracking.

Works with any extractor (Claude, GPT, regex, rules) and any document schema. One JSON label file per document, one Python function, one CLI command.

Includes a working 20-document invoice example with a Claude Haiku extractor so you can run it immediately.

── more in #large-language-models 4 stories · sorted by recency
── more on @doceval 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/doceval-eval-harness…] indexed:0 read:1min 2026-06-16 ·