{"slug": "documentai-bbox-benchmark", "title": "documentai bbox benchmark", "summary": "Malte Buettner benchmarked bounding box accuracy for Document AI models using pages from the FlashAttention-3 paper, testing Qwen, Kimi, and Mistral via OpenRouter. The evaluation scored models on coverage, intersection-over-union, and centroid distance to measure how well generated bounding boxes matched reference boxes from pdfplumber. Results showed inconsistent scores across multiple runs for some models, highlighting variability in open-weight model performance for document extraction tasks.", "body_md": "In my [previous post](https://www.maltebuettner.eu/documentai/), I talked a bit about the recent developments in the field of DocumentAI. Now comes the practical part. For the Attention v3 paper from the [ExtractBench](https://github.com/ContextualAI/extract-bench) dataset, ExtractBench focused only on extraction, but I am also interested in the bounding box reference that the models return.\n\nBecause ExtractBench had only a very limited selection of models without any open-weight ones among them, I ran a few extractions via `OpenRouter`\n\nespecially to see how well Qwen, Kimi, and Mistral are doing. So I took pages 1 and 13 from the [FlashAttention-3](https://arxiv.org/pdf/2407.08608) example from there and added \"reference\" bounding boxes with [pdfplumber](https://github.com/jsvine/pdfplumber) (it is a native PDF) as a reference. They are not perfect, but for a rough indication they are more than enough.\n\n* for some models I did not manage to generate extraction and bbox in one run. For these I ran separate extraction + bbox prompts.\n\n**Note: For some models I could not really get consistent scores on OpenRouter even after several runs.**\n\nThe bbox score is a bit over-engineered with coverage (for how many fields were bboxes generated?), intersection-over-union (to check how well the bbox \"fits\" the original one, also known as the [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index)), and centroid distance (to check if the bbox is roughly in the correct area):\n\ncoverage×(0.5×mean IoU+0.5×centroid score)\n\n```\nONE_SHOT_SYSTEM_PROMPT = \"Return only valid JSON matching the provided JSON Schema.\"\n\none_shot_user_prompt = f\"\"\"\nOnly use the provided page images. They are not necessarily consecutive pages.\nThe original PDF has 22 pages. If the schema asks for number_of_pages, use 22.\nPage mapping:\n- input image 1: original PDF page 1, page_index 0\n- input image 2: original PDF page 13, page_index 12\n\nEach scalar extraction field is an object with value and bbox. Use bbox null when\nthe value is not visible in the provided page images. Boxes are [x1, y1, x2, y2].\n\nJSON Schema:\n{annotated_extraction_schema_json}\n\"\"\"\n```\n\nI modified the original JSON schema a bit and added an additional `bbox`\n\nfield to every value. See the example for the `ids`\n\nfield:\n\n```\n{\n  \"ids\": {\n    \"value\": {\n      \"type\": [\"string\", \"null\"]\n    },\n    \"bbox\": {\n      \"type\": [\"object\", \"null\"],\n      \"properties\": {\n        \"page_index\": {\n          \"type\": \"integer\"\n        },\n        \"box\": {\n          \"type\": \"array\",\n          \"items\": {\n            \"type\": \"number\"\n          },\n          \"minItems\": 4,\n          \"maxItems\": 4\n        }\n      },\n      \"required\": [\"page_index\", \"box\"]\n    }\n  }\n}\n```\n\n", "url": "https://wpnews.pro/news/documentai-bbox-benchmark", "canonical_source": "https://maltebuettner.eu/posts/documentai-bbox-benchmark/", "published_at": "2026-05-14 00:00:00+00:00", "updated_at": "2026-05-30 14:39:24.529936+00:00", "lang": "en", "topics": ["large-language-models", "ai-research", "computer-vision", "natural-language-processing"], "entities": ["Malte Buettner", "ExtractBench", "ContextualAI", "OpenRouter", "Qwen", "Kimi", "Mistral", "FlashAttention-3"], "alternates": {"html": "https://wpnews.pro/news/documentai-bbox-benchmark", "markdown": "https://wpnews.pro/news/documentai-bbox-benchmark.md", "text": "https://wpnews.pro/news/documentai-bbox-benchmark.txt", "jsonld": "https://wpnews.pro/news/documentai-bbox-benchmark.jsonld"}}