Paul Okhrem on RAG for compliance document review: from 3 hours to under 20 minutes

wpnews.pro

cd /news/large-language-models/paul-okhrem-on-rag-for-compliance-do… · home › topics › large-language-models › article

[ARTICLE · art-17221] src=dev.to ↗ pub=2026-05-29T05:37Z topic=large-language-models verified=true sentiment=↑ positive

Paul Okhrem on RAG for compliance document review: from 3 hours to under 20 minutes

Paul Okhrem built a Retrieval-Augmented Generation (RAG) system that reduced compliance document review time from three hours to under 20 minutes per supplier package. The system automates document navigation, cross-document comparison, findings summary drafting, and gap tracking, while preserving the analyst's judgment for final decisions. Okhrem notes that the implementation required establishing document quality standards and three weeks of prompt calibration against specific compliance checklists.

read4 min views18 publishedMay 29, 2026

By Paul Okhrem · paul-okhrem.com

The compliance analyst I spoke with described her day the way most people describe a commute they've stopped noticing — with a kind of resigned acceptance.

Every new supplier onboarding required reviewing three to seven documents: contracts, insurance certificates, data processing agreements, regulatory certifications. Cross-referencing clauses. Flagging gaps. Writing up findings. For each supplier. On a team that was doing 30 to 40 of these a month.

The work wasn't complex in the sense of requiring rare expertise. It was complex in the sense of requiring sustained attention across large, dense documents — finding the right clause, checking it against the standard, noting whether it met the threshold or didn't. Multiply that by 80 pages per document, five documents per supplier, and 40 suppliers a month, and you have a team that's spending most of its capacity on extraction and comparison rather than judgment.

This is exactly the kind of problem RAG is suited for.

Retrieval-Augmented Generation is a pattern where a language model's responses are grounded in retrieved content from a specific document set — rather than relying on what the model was trained on. For compliance review, this means the model is reading the actual contract in front of it, not approximating what contracts typically say.

The implementation we built worked roughly like this:

The analyst doesn't have to read the whole document to find the relevant clause. The system finds it. The analyst reviews the finding and makes the call.

The shift from 3 hours to under 20 minutes per supplier package didn't come from a single optimization. It came from removing several layers of low-value work:

Document navigation. Finding the right section in an 80-page contract is not trivial. Tables of contents are inconsistent. Clause numbering varies by jurisdiction and template. Keyword search in PDFs works until it doesn't. The retrieval layer handles this.

Cross-document comparison. When you need to confirm that what the contract says matches what the insurance certificate covers, you're holding two documents in your head simultaneously. The system can be queried across multiple documents in a single pass.

Drafting the findings summary. Compliance reviewers were spending real time writing up what they found — not analyzing it, just describing it. The model drafts the summary. The reviewer edits and approves.

Tracking gaps. When a required clause is missing entirely, confirming its absence in a long document is surprisingly time-consuming. The system can return "not found" with confidence when the retrieval consistently produces no relevant content.

This is worth being direct about, because RAG implementations get oversold.

It doesn't replace the analyst's judgment. The system surfaces and synthesizes. It doesn't decide whether a finding is acceptable. Compliance is a domain where the edge cases matter enormously — a clause that technically meets the standard but in a context that creates risk requires a human to recognize that. The analyst still makes every call.

It requires good document quality. Scanned PDFs with poor OCR, handwritten addendums, non-standard formats — these degrade retrieval quality significantly. Part of the implementation work was establishing document submission standards for suppliers, which turned out to be as valuable as the AI tooling itself.

It needs to be calibrated against your standards. A generic RAG system doesn't know what your compliance thresholds are. The prompts that drive the review queries have to be written against your actual checklist — which requires someone who understands both the compliance requirements and how to prompt effectively. This calibration took about three weeks of iteration before the outputs were reliable enough to trust.

Hallucination risk is real but manageable. Because the model is working with retrieved chunks from the actual documents, hallucination is less likely than in general-purpose AI use. But it's not zero. The system was designed to always show the source text alongside every finding, so the analyst is always verifying against the original. This is not a luxury — it's a requirement.

After three months in production, the team had processed 140 supplier packages through the system. Average review time had dropped from just over 3 hours to 18 minutes per package. The analysts reported spending more of their time on the genuinely complex cases — the ones with unusual structures or missing documents that required actual compliance judgment — and less on mechanical extraction.

One finding that wasn't anticipated: the consistency of the reviews improved. When humans review documents under time pressure, findings vary. The same clause gets flagged by one reviewer and missed by another. The AI-assisted reviews were more consistent across the checklist, which had downstream benefits for the audit trail.

The team didn't get smaller. They got better at the parts of the job that actually required them.

Paul Okhrem consults on AI implementation for operations and compliance teams. More at paul-okhrem.com

source & further reading

dev.to — original article Why AI Agents Are Replacing Traditional SaaS The Right Way to Start Claude Code on an AWS Project Four Eras of Cloud Security. Same Verb.

~/api · this article 200

$curl api.wpnews.pro/v1/news/paul-okhrem-on-rag-for-c…

Read original on dev.to → dev.to/elogic_commerce/paul-okhrem-on-rag-for-co…

mentioned entities

Paul Okhrem

metadata

slugpaul-okhrem-on-rag-for-compliance-document-review-from-3-hours-to-under-20

topic#large-language-models

secondary3 topics

sentimentpositive

canonicaldev.to

navigation

← prevHumans Are Just Stochastic Parro…

next →Why Context Is Not Enough

── more in #large-language-models 4 stories · sorted by recency

cio.com · 14 Jul · #large-language-models

Where Meta’s WhatsApp agent can actually win

turnitin.report · 14 Jul · #large-language-models

Show HN: Turnitin Report – AI checker and AI detector for student papers

cryptobriefing.com · 14 Jul · #large-language-models

DeepSeek valued at $71B ahead of new funding round, per Financial Times

cio.com · 14 Jul · #large-language-models

How AI agents are shaping the future of work

── more on @paul okhrem 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required