cd /news/artificial-intelligence/we-spent-6-months-feeding-our-compli… · home topics artificial-intelligence article
[ARTICLE · art-15877] src=dev.to pub= topic=artificial-intelligence verified=true sentiment=↓ negative

We spent 6 months feeding our compliance data to a major cloud AI. Here's what we got back.

A team of compliance engineers spent six months feeding their specialist screening data, edge cases, and analyst feedback into a major cloud AI platform, only to discover they were training a general-purpose model with their hardest-won expertise. The project revealed that the AI's false positive rates were brutal, model updates caused risk scoring to drift unpredictably, and the outputs became indistinguishable from competitors using the same cloud service. The team concluded that generic cloud AI commoditises compliance work, and that defensible products require domain-specific layers such as longitudinal monitoring and jurisdiction-specific workflows.

read7 min publishedMay 28, 2026

TL;DR: We built our first generation of compliance tooling on top of one of the big three cloud AI platforms. We fed it our screening data, our edge cases, our analyst feedback loops. After six months we realised we were training their general-purpose model with our specialist knowledge, and getting back outputs that any other team could buy off the shelf. This is what we learned, what we ripped out, and what we built instead.

When we started building out our screening logic, the path of least resistance was obvious. Plug into a major cloud AI service. Use their text models for entity resolution. Use their classification models for risk scoring. Pipe our analyst review decisions back in as feedback signal. Ship fast, iterate faster.

It felt like the right call. The infrastructure was there. The latency was acceptable. The pricing looked manageable at low volume. And honestly, the demos looked great in front of customers.

What we didn't think about hard enough: every analyst decision we sent back through that pipeline was teaching a general-purpose system how to do compliance work. Our edge cases. Our adverse media patterns. Our PEP disambiguation logic. The stuff our team had spent years getting right.

We were paying for the privilege of training someone else's model with our hardest-won expertise.

The technical problems showed up before the strategic ones. Three things hit us in the first quarter.

First, false positive rates were brutal. The general-purpose model was good at language but had no native concept of why a name match in a high-risk jurisdiction matters more than the same match in a low-risk one. We ended up wrapping the AI calls in so much custom logic that the AI was essentially doing string comparison and we were doing the actual compliance work in deterministic code on top.

Second, the model updates broke us. Twice. The platform pushed a new version of the underlying model and our risk scoring drifted. Cases that scored 0.4 on Monday scored 0.7 on Tuesday with no code change on our side. Try explaining that to a regulator. Try explaining it to a customer who has just had their entire alert queue rebalanced overnight.

Third, and this is the one that stung: our outputs started looking suspiciously similar to what other teams in our space were shipping. Same false positive patterns. Same edge case failures. Same blind spots. We were all drinking from the same well and producing the same water.

A Head of Compliance at a UK challenger bank put it to me cleanly when we were doing customer research: "If your model is the same model my last three vendors used, why am I paying you?"

This is where the engineering problem becomes a strategic one. The big cloud AI platforms are doing what Amazon did with Basics. They watch what sells, they generalise it, and they offer the same capability to everyone at a price point that erodes specialist margins.

For compliance specifically, the layers most exposed to commoditisation are clear:

Layer Commoditisation risk Why
Basic identity verification High Document OCR and liveness are now table stakes
Sanctions list matching High Standardised data, standardised algorithms
Generic entity extraction High General LLMs are genuinely good at this
Adverse media classification Medium Domain nuance still matters
Behavioural transaction monitoring Lower Requires longitudinal customer-specific data
Jurisdiction-specific EDD workflows Low Regulatory nuance, country-specific patterns

If your product sits in the top three rows and you are running it on a generic cloud AI, you are in trouble. Not next year. Now. Your customers can replicate 70% of what you do with a weekend hackathon and a developer account. The layers that resist commoditisation are the ones where domain expertise compounds: longitudinal monitoring, jurisdiction-specific workflows, the messy human judgement that wraps the model output. That is where the real product lives.

We spent the next quarter doing two things in parallel.

We stopped sending analyst feedback to any external general-purpose model. Every decision our team made was a proprietary training signal. We treated it that way. The feedback loop now runs into models we control, on infrastructure we control, with versioning we control.

We also stopped using a single cloud AI as the brain. We moved to a routing pattern where different decisions go to different specialised systems. The routing logic itself became part of the product.

Here is roughly how the decision flow works now:

The key thing in that diagram is the loop at the bottom. Analyst decisions feed our models, not someone else's. The general-purpose AI is still there for narrow tasks, but it never sees the judgement layer.

Three things got measurably better.

False positive rates dropped because the specialist components were tuned for their narrow tasks rather than being asked to be good at everything. Our team stopped wrapping AI calls in defensive code and started using the outputs directly.

Versioning became boring, which is what you want. We control when our models update. Customers know what version is running. Audit trails are clean. Regulators can be shown a stable, explainable system rather than a black box that drifts every few weeks.

And, less measurable but more important: our outputs stopped looking like everyone else's. Our edge case handling is ours. Our risk scoring reflects how our customers actually think about risk. The product has a point of view again.

If you are building compliance tooling on top of a major cloud AI right now, three honest questions. Where does your analyst feedback go? If the answer is "into a general-purpose model we don't control", you are subsidising your competitors and giving away the most valuable training data in your business.

What happens when the underlying model version changes? If you cannot answer that with a specific test plan, you have a regulatory exposure waiting to happen. Compliance systems need stable, explainable behaviour. Drifting model versions are not stable.

Which layers of your product are genuinely defensible? Be honest. If the answer is "all of it because we have great UX", look again. UX is not a moat in compliance. The moat is in the workflow nuance, the domain feedback loops, and the integration depth that took years to build.

The cloud AI platforms are not the enemy. They are excellent infrastructure for the right tasks. The mistake is treating them as the brain of your compliance product. They are not. You are. Or you should be.

Should compliance teams stop using major cloud AI platforms entirely?

No. They are good infrastructure for narrow tasks like document parsing, language translation, and basic entity extraction. The mistake is using them as the decision layer or feeding proprietary analyst judgement back into them. Use them for what they are good at, control the rest.

What is the biggest risk of building KYC and AML on generic AI services?

Two risks tied together: model version drift breaking your regulatory explainability, and your analyst feedback loop training a general-purpose model that your competitors can also access. The first is an operational risk. The second is a strategic one.

How do you keep compliance AI explainable for regulators?

Control your model versioning, log every decision with the exact model version that produced it, and avoid using black-box general-purpose models as the final decision layer. Specialist models tuned for narrow tasks are easier to explain than a single large model doing everything.

Is specialist RegTech actually different from cloud AI under the hood?

The underlying machine learning techniques overlap, but the difference is in the workflow layer, the feedback loops, and the domain-specific tuning. A specialist platform should own its training data, version its models predictably, and route decisions to components built for the specific compliance task. If a vendor cannot explain how those three things work, they are probably reselling generic AI with a thin wrapper.

What should engineers ask vendors before integrating their compliance AI?

Ask where analyst feedback goes, who owns the resulting training data, how model versions are managed, and what happens during a regulatory audit when you need to explain a specific decision from eighteen months ago. The answers will tell you whether you are buying a product or renting a commodity.

We built Zenoo after living through this exact problem. If you are stitching compliance providers together on top of a generic cloud AI and the seams are starting to show, it might save you the same six months we lost.

── more in #artificial-intelligence 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/we-spent-6-months-fe…] indexed:0 read:7min 2026-05-28 ·