cd /news/ai-safety/ai-governance-for-law-firms-what-pol… · home topics ai-safety article
[ARTICLE · art-43614] src=dev.to ↗ pub= topic=ai-safety verified=true sentiment=· neutral

AI Governance for Law Firms: What Policy Can't Catch

Blake Aber of Predicate Ventures argues that AI incidents in law firms, such as the Sullivan & Cromwell hallucination case, stem from infrastructure failures rather than policy gaps. He advocates for a 'harness layer' with citation verification, confidence thresholds, and drift monitoring to prevent errors at generation time. The piece distinguishes between policy as compliance and infrastructure as prevention, noting that tool vendors like Harvey or CoCounsel do not provide firm-specific safeguards.

read4 min views1 publishedJun 29, 2026

Where AI incidents in legal actually come from, and what infrastructure (not policy) prevents them.

Blake Aber · Predicate Ventures · 2026

When Sullivan & Cromwell apologized to a federal bankruptcy judge in April 2026 for AI hallucinations in a court filing, the firm's apology letter said the firm had policies. Safeguards existed. Those safeguards weren't followed.

That framing, "the safeguard existed but wasn't followed," is how a policy failure gets described. But something more specific happened: a hallucination was generated, wasn't caught at generation time, wasn't caught at review time, and made it into a document that got filed.

That's not a policy problem. It's an infrastructure problem.

The distinction matters because it determines what you build next.

Policy is a promise made before the event. A well-written AI acceptable-use policy says: don't submit output you haven't reviewed; verify citations before they go into a document; a human must approve anything client-facing.

This works when the human executing the task has time, attention, and professional accountability in that moment. It fails when one of those is missing: a deadline, a junior practitioner, a late-night run.

Policy can't:

Policy can:

One of those is prevention. The other is compliance.

An AI harness layer operates at the point of generation, not at the point of review. This reflects a broader reality that production AI is mostly harness and very little model. For legal work specifically, three components matter most:

1. Citation verification at generation time. Not a prompt instruction ("always verify citations"). A pipeline step. Every legal source referenced by the AI system is verified against a live database before the output is surfaced to the user. If the citation doesn't resolve, the output is flagged before it reaches a document. This is the component that would have caught the S&C error before it left the system.

2. Confidence thresholds with explicit escalation. Most LLMs don't communicate confidence levels by default. They output text with the same fluency whether the underlying generation is well-supported or a hallucination. A governed system adds a confidence scoring layer and an explicit routing rule. Output below threshold X goes to a review queue before it's surfaced. Output above threshold is still human-reviewed, but doesn't go through a secondary check. The threshold becomes an operationalized risk tolerance, not a general instruction to "be careful."

3. Eval drift monitoring. Models change. The same monitoring shows up where eval, audit, and rollback are treated as design inputs for clinical-adjacent AI, because regulated work in any field fails the same way without it. Vendor systems update, fine-tune, and swap underlying models without always surfacing clear versioning to their enterprise customers. A production AI system that performed well in testing may behave differently six months later. Without a monitoring layer that tracks output quality over time, that drift is invisible until it produces an incident. Eval monitoring is a production-phase activity, not a testing-phase one.

1. Policy is visible; harness is invisible. Boards, legal ops committees, and general counsel can review a policy document. They can't easily audit whether a citation-verification pipeline exists and is functioning. This creates pressure to solve the visible problem (produce a policy) at the expense of the invisible one (build the infrastructure). The S&C incident is the outcome when that trade-off is made.

2. The tool vendor is not responsible for your harness. Harvey, CoCounsel, and Lexis+ AI are tools. They don't know what your workflows require, what your confidence thresholds should be, or how their system integrates with your document management and review processes. The harness layer (the constraints and monitoring that sit between the tool's output and your firm's document) is your responsibility to design and build.

3. The governance hire alone doesn't solve it. Hiring a Chief AI Officer is the right move. But a CIAO with a policy mandate and no harness build is one deadline away from the same outcome as no CIAO. The value of the hire depends on whether the first deliverable is a policy framework or an infrastructure spec.

Before a new AI capability goes into production at your firm, ask:

These aren't vendor-evaluation questions. They're infrastructure questions. The firms that ask them before deployment are the ones that won't be writing apology letters.

── more in #ai-safety 4 stories · sorted by recency
── more on @blake aber 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/ai-governance-for-la…] indexed:0 read:4min 2026-06-29 ·