Mistral OCR 4 Ships: Bounding Boxes, Not Just Text

Mistral AI released Mistral OCR 4, a document intelligence API that returns bounding boxes, block classifications, and confidence scores alongside extracted text, enabling structured document processing without additional LLM passes. The API costs $4 per 1,000 pages, undercutting AWS Textract by 4 to 12 times for structured extraction tasks.

Mistral AI released Mistral OCR 4 today, and the headline feature isn’t accuracy — it’s structure. Where earlier versions of Mistral OCR and most competing services returned a flat text dump, OCR 4 now returns bounding boxes for every content region, typed block classification, and inline confidence scores alongside the extracted text. That’s a meaningful shift: this is a document intelligence API , not just an OCR service. The practical consequence is that developers building RAG pipelines, agentic document workflows, and compliance systems no longer need to layer a second LLM pass on top of OCR output to figure out what they’re reading. OCR 4 tells you whether a text block is a table cell, a contract clause, or an equation — before you do anything with it. Mistral OCR 4 Adds Bounding Boxes, Block Types, and Confidence Scores The structural upgrade in OCR 4 consists of three new outputs alongside the existing markdown text. Bounding boxes localize every content region spatially, which matters for in-context highlighting, clickable citations, and layout-aware chunking. Block classification tags each region as one of 7+ types: title, paragraph, table, equation, signature, figure, or caption. Confidence scores are available at page or per-word granularity — off by default to keep response payloads small, opt-in when you need them. These aren’t separate API calls. A single request to Mistral’s OCR endpoint https://docs.mistral.ai/api/endpoint/ocr with mistral-ocr-latest returns all of this together. You pass a bbox annotation format parameter with a Pydantic or Zod schema to shape structured field extraction, and set confidence scores granularity to "word" or "page" when you need reliability signals. POST /v1/ocr { "model": "mistral-ocr-latest", "document": { "type": "pdf url", "url": "..." }, "confidence scores granularity": "word", "bbox annotation format": { / Pydantic/Zod schema / } } Three Use Cases This Unlocks Flat OCR output forced developers to write brittle parsing logic that broke on layout changes. OCR 4’s structured representation changes what’s practical to build. Three scenarios become straightforwardly solvable: Human-in-the-loop document pipelines. Confidence scores let you auto-approve high-confidence extractions and route low-confidence regions to human review. For invoice processing, contract review, or compliance workflows, this is the threshold-based routing that previously required a second model pass. Semantic RAG chunking. Chunking PDFs by character count produces semantically incoherent retrieval units — half a table mixed with paragraph text. OCR 4’s block classification lets you chunk by block type: table cells to one handler, equations excluded from retrieval entirely, paragraph text to the embedding pipeline. Better chunks mean better retrieval. Agentic document workflows. Agents interacting with documents via flat text have to infer structure. With OCR 4’s output, an agent knows it’s reading a signature field versus a terms clause, and can act accordingly — triggering different tools or workflows based on block type. Related: LangSmith Engine and SmithDB: Fix Agent Failures Fast The Pricing Case Against AWS Textract Mistral OCR 4 costs $4 per 1,000 pages via API and $2 per 1,000 pages via the batch API. Compare that to AWS Textract’s pricing for structured extraction: $15 per 1,000 pages for tables, $50 per 1,000 pages for forms. For the use cases where OCR 4 competes directly — structured extraction from invoices, contracts, and forms — Mistral is 4 to 12 times cheaper. The math gets stark at volume. According to a 2026 document AI cost comparison https://aiproductivity.ai/blog/document-ai-cost-comparison/ , processing 50,000 invoices per month runs roughly $3,250 with AWS Textract versus $100 with Mistral’s API or $50 via batch. Additionally, the self-hosted option — OCR 4 in a single container, available to enterprise customers — is a differentiator none of the big three offer. AWS Textract, Azure Document Intelligence, and Google Document AI all require sending documents to their cloud. For healthcare, finance, and legal teams with data residency requirements, self-hosted changes the conversation entirely. Before You Switch: Two Caveats Mistral reports strong benchmark numbers: 85.20 on OlmOCRBench https://www.codesota.com/ocr/benchmark/olmocr-bench top score among tested models , 93.07 on OmniDocBench, and a 72% win rate in independent annotator comparisons. These are legitimately impressive. However, they all come from Mistral’s own evaluation harness. Independent third-party verification hasn’t happened yet — it’s too new. Mistral themselves recommend running tests on your specific document types before committing, because benchmark artifacts from formatting variations and ground-truth errors affect scores across the board. There’s also a specific gotcha for developers using Mistral OCR via the Azure marketplace: the confidence scores granularity parameter is not supported in the Azure-hosted version. If you’re integrating through Azure, you won’t get per-word confidence scores until Microsoft adds support. A confirmed limitation, not a rumor — documented in Microsoft’s own Q&A. Related: Mistral Vibe: Coding Agent With Open Weights and Half the Cost Key Takeaways - OCR 4 returns structured document representation — bounding boxes, block type classification, and confidence scores — not just markdown text. This changes what developers can build on top of document ingestion. - The three practical unlocks are human-in-the-loop pipelines confidence-based routing , semantic RAG chunking block-type-aware , and agentic document workflows structure-aware agents . - Pricing for structured extraction is $4/1K pages API and $2/1K batch — 4–12x cheaper than AWS Textract for equivalent use cases. Self-hosted container available for regulated industries. - Benchmark numbers are strong but sourced from Mistral’s own evaluation. Test on your documents before switching pipelines. - If you’re using Mistral OCR via the Azure marketplace, confidence scores aren’t available yet — direct API access is required for full OCR 4 features. The full release details are on Mistral’s blog https://mistral.ai/news/ocr-4/ . Given the pricing gap and the self-hosted option, this is worth testing against your current document processing stack — especially if you’re paying Textract rates for structured extraction.