When the Document Leaves the Factory: Provenance and the Adversarial Check at the Boundary

wpnews.pro

A factory has doors. The interesting question is what you let out of them, and in what state.

A companion essay argued that an agentic coding loop is a control loop, and that the parts everyone leaves unspecified are the parts that matter: a setpoint you can name, a comparator that can actually say no and prove it, an andon cord that stops the line before a workaround becomes load-bearing. It closed the loop inside the factory, where you own the substrate. You can impose a kernel on your own repository. You can wire a gate into your own pre-commit hook. The comparator is a thing you build and control, and the agent works inside walls you poured.

But a factory has doors, and the whole point of a factory is that things leave it. The artefact a knowledge business ships is not a binary. It is a document: a memo, a contract markup, a due-diligence report, a pitch, a regulatory filing. And the moment that document crosses the boundary out of the factory, into a counterparty you do not control, every comfortable assumption from the first essay quietly fails. You cannot run your kernel on their desk. You cannot put a gate in their inbox. The comparator you so carefully built does not cross the boundary with the artefact.

So this essay is about the other half of the problem. Not closing the loop inside one factory, but what happens when the output of one agentic factory becomes the input to another, and a third, and the documents start talking to each other faster than anyone reads them. The control problem does not disappear at the door. It changes owner, and gets harder. As in the companion piece, I am borrowing the control-loop vocabulary for its intuition, not claiming the theorems transfer; I will try to flag where the analogy is load-bearing and where it is only suggestive.

The comparator does not cross the boundary #

Inside your own system, verification is something you can mandate. You decide that a fact is not "verified" until a predicate that could have failed comes out true. You own the build, so you own the meaning of green.

Across an organisational boundary you own none of that. When a firm sends you a forty-page report, you have no access to the process that made it. You cannot see whether a human read it, whether the model that drafted it was grounded in anything real, whether the confident paragraph on page nineteen is load-bearing or invented. You receive the artefact stripped of its lineage, and you are asked to act on it. Your loop has an actuator (your own people and agents, acting on what the document says) and a goal (the decision you have to make), but the sensor reading you would need (is this document actually true?) is exactly the thing the sender did not send you.

This is an open loop again, and it is the same open loop the first essay diagnosed, only now the gap sits between two organisations rather than inside one. The sender closed their loop, or did not, behind a wall you cannot see over. You inherit their output as if it were ground truth, because you have nothing else to treat it as. An open loop with a powerful actuator is an unguided projectile. String several together, each agent firing on the unverified output of the last, and you have a cascade of open-loop actuators in which an error introduced at any stage propagates forward with nothing, at any hop, built to attenuate it.

Where can a comparator live, once you accept it cannot be installed on the far side of the door? Two moves are available to a sender acting alone, with no cooperation from the counterparty: the sender can make enough lineage travel with the artefact that the receiver can run a cheap comparator against it, or the receiver can supply the comparator itself, as the human act of judging what is wrong with what arrived. There are richer options when both sides will coordinate, and they matter: an independent third party that attests to the process, cryptographic signing that binds a claim to an identity, attested execution that proves how an output was produced. But those need the counterparty to agree on infrastructure, and most documents cross boundaries cold. This essay is mostly about the two unilateral moves, because they are the two you can deploy on Monday without anyone's permission, and almost no one is building either on purpose.

Semantic microplastics #

First, why this becomes urgent now rather than being the ordinary fog of doing business that it has always partly been.

The first essay named an asymmetry: agents collapsed the cost of emitting a diff by orders of magnitude and barely touched the cost of verifying one. The same asymmetry crossing the boundary produces a worse object, because now the cheap emissions are not staying inside one repository where one team owns the consequences. They are posted out into a graph of organisations, where each recipient's agent treats the inbound document as context, grounds its own confident output on it, and posts that onward in turn.

Consider what that does to error. A human-authored document carried roughly human error, introduced at human speed, and a reader applied roughly human scepticism. In the new regime an agent generates a plausible passage with some fabricated content in it, a receiving agent ingests it as source material, summarises it confidently, and a third agent treats that summary as grounding for a recommendation. No one lied. Each step was locally reasonable. But the fabrication from step one has now been laundered through two more systems, each of which added confidence and stripped the caveats, and by the time a human looks it wears the uniform of a settled fact, cited in three places.

I have taken to calling this semantic microplastics. The analogy holds on the properties that count. The contaminant is introduced once and does not visibly harm anything at the point of introduction, so nothing rejects it then. It is small enough to pass every filter built for larger debris, the obvious falsehoods and the clumsy errors. And in a chain with no comparator at the hops it concentrates rather than dilutes: each restatement strips the hedge and adds confidence, so the same false claim grows more authoritative the further it travels. That last property has a precise scope. Absent a check at any hop, there is no passive step that removes the contaminant, only steps that intensify it; the moment one hop runs a real comparator, the concentration stops there. Two distinct things are worth separating: a single error gaining authority as it moves along a chain, and an error copied across the graph into many downstream documents at once. The first is intensification, the second is spread. Both are real, both compound, and the remedy for both is the same: a comparator somewhere on the path.

This contamination has the same shape as two problems the companion essay met inside the factory, and naming the shape helps. There, a per-diff gate was blind to a defect that lived only in the accumulation of edits, not in any single one; and individually safe agents could combine into an unsafe coalition, safe joined to safe coming out unsafe. A document graph is the third instance. Every document can be clean and the graph still poisoned, because the contamination is a property of the combination, not the parts, and you no more certify the graph by checking each document than you certify a coalition by checking each agent or a refactor by checking each diff. Non-compositional failures are not caught by inspecting the nodes harder, only by a comparator on the edges between them, which is the structural reason it has to sit on the path rather than inside any node.

A colleague calls the demand side of the same problem AIDR, "AI Don't Read", and the phrase is worth keeping because it names the regime the rest of this essay lives in. Emission is now so cheap that the volume of text aimed at any one human has outrun the human's capacity to read it, so people have stopped reading and started pasting the document into their own model instead. The document now has two readers, and only one of them is the one you wrote it for. In the AIDR regime you write forty pages of human prose, the human reads none of it, their agent reads all of it, and whatever fragments of nonsense you let through are now grounding someone else's machine. The bandwidth problem and the contamination problem are the same problem, and the honest response is to design the document for the two readers it actually has.

The two-reader document #

The structural move is to stop pretending the document has one reader. This is not, by itself, a new format. Bottom-line-up-front briefing, the executive summary over the appendix, the term sheet over the prospectus: writers have put a short layer over a long one for as long as busy principals have existed. None of those fixed the contamination problem, and it is worth being clear about why, because the difference is the whole point.

The existing pattern is incidental. The summary is a courtesy to a human in a hurry, and the detail beneath it is the same prose, written for the same human, that the human will not read. The two-reader document is designed, and its two layers are for two different readers with opposite constraints and, critically, two different authors.

On top, a short human layer: here is what you need to decide, here is the recommendation, here is what would change it. This is the layer a human has read and will stake their name on, and it is deliberately tiny, because the scarce resource it spends is human attention. It is not a model's compression of the layer below, produced by "make this shorter". It is authored by a human deciding which few things are load-bearing, which is exactly the judgement a model cannot be trusted to make about its own output, because picking what matters is the whole of the work.

Underneath, a machine layer: the full, structured, losslessly-detailed payload, every fact and figure and clause in a form another system can consume without a human in the loop. Not prose padding. The raw material, kept raw.

The document has become an interface between two organisations, and a good interface has a small, stable, human-meaningful surface over a rich, machine-readable body. The summary is the function signature. The payload is the implementation. One discipline the software analogy demands and the courtesy summary never had: the relationship between the layers must itself be checkable. The machine layer is the canonical payload, the human layer must be traceable to it, and a human summary that asserts something the payload does not support is a defect, not a nuance.

The good lawyer has always known the human half of this. The difference between a good lawyer and a bad one is not how much they can write; it is that the bad one sends you ten pages of "you could do this, or this, or this", and the good one has done the reading and tells you the one thing to do. What is new is the second reader. The good lawyer never had to serve a machine that would read the ten pages so the human did not, and that is the half the profession has not yet designed for.

Provenance: the lineage that lets a receiver close the loop #

The two-reader document solves the bandwidth half. It does nothing, on its own, for contamination. A beautifully short human layer over a beautifully complete machine payload is just a better-organised vector for microplastics if neither layer carries any evidence of where its claims came from. So attach the lineage to the artefact, because lineage is what lets a receiver run a comparator against your claims without the counterparty's cooperation and without seeing inside your factory.

Here precision matters, because "provenance" is usually made to carry four different jobs with four different trust roots, and a receiver who checks one thinks they have checked the others. Keep them distinct:

Origin(true provenance): where a claim came from, from what inputs, via what process. Trust root: the chain of custody.** Verification**: that someone checked the claim against something, by what method, with what result, when. Trust root: the soundness and falsifiability of the method, since a check that could not have failed certifies nothing.Attestation: a named party vouching for the verification. Trust root: that party's accountability.** Confidence**: how reliable the claim is. Trust root: a calibration measured against outcomes, not a number the producing model assigned itself.

These can be collapsed into a single human-legible citation for a reader in a hurry, and should be, but they are four assertions, and a document that records origin while leaving verification blank is making a far weaker claim than one a careless reader will hear. A confidence of 0.8 that a receiver cannot trace to a calibration is not a measurement; it is a decimal. An "attested by" with no mechanism behind the name, no professional liability, no indemnity, no licence at risk, is a signature on nothing.

What this buys the receiver is a cheap loop they could not otherwise close. They cannot run your kernel, but against a real lineage they can run a far cheaper test: does this claim cite a source, does the source resolve to something they can independently fetch, is the verification method one they would accept, is the named attester someone who would actually be on the hook if it were false, is the freshness inside the window where the claim could still be true. Two honest limits. A claim whose source is a sealed filing or a proprietary database is not independently fetchable, and there the receiver is back to trusting the attester rather than the source. And a plain text lineage of strings is a bill of lading you can print yourself: it delivers accountability, a named party and a trail a sceptic can pull, but it is not tamper-evidence. For the adversarial, machine-scale, or one-shot case, accountability is not enough, and you need the coordinated tier: signing that binds the claim to an identity, an established trust root so the receiver knows whose signature to expect, and a policy the receiver actually enforces. The unilateral version is the floor, not the ceiling.

Two rules make the floor hold rather than leak. First, completeness: an untagged claim is not a neutral absence, it is suspect by convention. Without that rule the mechanism is worse than nothing, because a sender can tag every claim they are confident about and leave the shaky ones as untagged confident prose, training the receiver to spend their scarce scepticism on the well-evidenced facts while the fabrications ride through unlabelled. Tag everything load-bearing, and treat the untagged as unverified by default. Second, absent provenance blocks, it does not annotate. The tempting design is for a receiving agent to lower its confidence in an unsourced fragment and carry on. That is annotate-and-continue, and it is the compounding mechanism wearing a hi-vis jacket: the lowered-confidence claim flows downstream, gets normalised into a summary, and re-emerges as grounding with the annotation stripped. A supply chain does not relabel uncertified goods and wave them through; it holds them at the dock. A claim whose provenance is absent, unverifiable, or past its freshness window must not be promoted to grounding at all. It routes to a human, or it stops.

This is also where the first essay's point about stale facts gets its higher-stakes form. Information that goes stale silently is worse than no information, because people act on it with confidence. Inside your own walls that costs you a bad afternoon. Crossing a boundary, into a counterparty with even less context than you had, a confident stale fact with no freshness record is a live round handed to someone who cannot tell it is loaded. The freshness field is the safety catch. It does not stop the fact being wrong. It stops it being silently wrong, which is the only kind that compounds.

Who is on the other side of the door #

The supply-chain framing has a blind spot I have to name, because in the domains where this matters most the receiver is not a neutral downstream consumer. There are three quite different parties who might be reading, and the design changes for each.

A client receiving your work has interests aligned with yours. Provenance travels freely, the fuller the machine layer the better, and the two-reader document is an unambiguous gift: their agent gets the lossless payload, their people get the decisions.

A counterparty in a negotiation is adversarial, and here a full, structured, losslessly-detailed machine layer is not a gift, it is strategic exposure. You would be handing their model a clean, queryable index of every hedge, every qualification, every weak point you would rather they had to find by hand. The two-reader document does not mean "disclose everything to a machine". It means design the machine layer for the audience, which against an adversary is the deliberate, bounded set of facts you intend them to have, with the same care a good lawyer already applies to what goes in a disclosure letter.

A regulator or a court reads under formal rules, and those rules govern what metadata is disclosable. A structured lineage layer attached to a produced document may, depending on the jurisdiction and how the document is built, be discoverable, and disclosing a structured summary of privileged analysis to a counterparty can waive privilege over the advice it summarises. This forces a distinction the engineering instinct misses: the lineage you keep internally and the attestation that travels externally are not the same object. Internally you record that a fact came from the client's board call on a particular date. Externally what crosses the door is "verified by a named lawyer on that date against a category of source", never the privileged origin itself. The external attestation vouches for the check without disclosing what would be costly to disclose. Build the two as one object and you will eventually either leak privilege or refuse to attach provenance at all.

The adversarial check is the irreducible human act #

Provenance lets a receiver close a cheap loop against lineage. It does not close the expensive one, the loop on whether the thing is actually right, because a perfect trail can faithfully record the lineage of a confident mistake. Something has to play comparator on substance, and across a boundary that something is a human. The question is which human act survives, because most of the candidates have just been automated.

It is not emission; the machine writes the first draft now and most of the final one. It is not, mostly, reading; the machine reads faster and more. The act that does not move is judgement of what is wrong with confident, well-formatted output. An agent produces work that looks finished, because looking finished is precisely what large models are best at. The irreplaceable human job is the adversarial one: to be the reader who assumes the polished artefact is wrong until shown otherwise, and who knows the domain well enough to find the place it is wrong. The thing acting and the thing judging must not be the same process, for the reason an auditor must not report to the executive whose numbers they audit, and that reason gets stronger, not weaker, the more capable the actuator becomes.

A common image among practitioners deploying these tools is the human as pilot, overseeing capable copilots rather than competing with them, treating model output as raw material and locating the differentiation in the judgement applied to it. That framing is right, and the two-reader document adds the one thing it leaves open: where the pilot should look. Not at the whole forty pages, which is unreadable, and the point was that they should not have to. At the human layer, the small set of load-bearing decisions, which is the only surface small enough to actually scrutinise. This resolves the apparent paradox in AIDR. The claim was never that humans should read nothing. It is that human attention is the scarcest input in the system and must be spent on the one thing only it can do, applied to the one surface where doing it is feasible. You cannot adversarially check forty pages. You can adversarially check the three decisions, hard, if someone has done the work of telling you those are the three.

The legal document, read as code #

The domain where all of this lands hardest is law, and it is worth saying why with more care than the slogan "law is just coding with natural language" deserves, because the slogan is wrong in the ways that matter to a lawyer and the careful version is not.

Take the narrow, true claim first. An operative legal document, a contract clause, a regulatory submission, a formal certification, shares the properties that make code hard: small changes have large and non-local effects, the cost of a wrong clause is paid much later by someone who did not write it, and the artefact is read and acted on by parties who did not author it and cannot see the process that produced it. For those documents the engineering machinery of the last decades is genuinely apt. But law is not code in the way that counts most: code runs on a compiler that executes deterministically, and a legal document runs on judges and regulators who interpret, with discretion, against doctrine and commercial context. Legal language has open texture by design, it is defeasible, a valid clause can be overridden by an implied term or a statute or an equitable doctrine that postdates the drafting, and its meaning is fixed partly at interpretation time by a party with independent authority. None of that has an analogue in any execution environment. The analogy is strongest for operative, transactional text and weakest for the argumentative and advisory documents, the brief, the opinion, the memo, whose job is persuasion or judgement rather than instruction. Lump those together, as it is tempting to, and you will draw the wrong lesson about both.

And none of this is virgin ground. "Law as code" has a thirty-year history that already tested the strong version and found its edges: Szabo's smart contracts in the 1990s, the computational-legal-reasoning tradition running alongside them, the Rules-as-Code programmes several governments now run, purpose-built languages like Catala for encoding statute. The recurring finding is exactly the one above, that the operative and computable core formalises well and the interpretive penumbra resists. What the agentic moment changes is not that finding but the volume and the speed: the documents are increasingly machine-read and machine-drafted, and flowing between organisations faster than the old human discipline can keep up. The old conclusions about where formalisation stops still hold. The pressure on the part that does formalise has gone up by orders of magnitude.

Two pieces of the engineering toolkit are arriving in law whether or not the profession invites them. The first is version control with a reviewer that triages. Track-changes is a diff with the ergonomics removed: it shows every change with equal weight and leaves the partner to find the ones that matter and who made them. A pull request does the opposite, it attributes each change and increasingly lets an agent surface to the human only what is worth a partner's eyes, and the associate-drafts, partner-checks hierarchy maps onto it cleanly. The second is documentary accountability, which I am deliberately not calling formal verification, because that term should be reserved for kernel-checked proof and this is something humbler and still valuable: lineage taken seriously, where did this come from, when, who checked it, does it still cohere with what it touches.

Law is also where the absence of this has already produced the genre's defining failure, and it is worth describing accurately rather than as a morality tale. Lawyers have been sanctioned for filing briefs citing cases that did not exist, invented by a model and never checked; Mata v. Avianca is the case everyone names. The instructive part is that legal citation is already a provenance system, Bluebook and OSCOLA and the reporters and the databases exist precisely so a citation can be resolved and checked, and the failure was not the absence of a schema. It was a verification-discipline failure: a human did not run the check that the existing system made available. So the lesson is not "add provenance and the problem goes away", because the provenance existed. The lesson is the one the first essay made about prompt-prayer: discipline you have to remember is discipline you will eventually skip, and the fix is to move the check from something a tired associate is asked to do into something the workflow will not let them skip. Infrastructure beats discipline not because the trail is novel but because the enforcement is.

What does this look like on Monday, without waiting for anyone to adopt a standard? A due-diligence report ships in two layers: a short human layer stating the handful of findings that actually bear on the deal and the recommendation, authored and signed by the lawyer, over a structured machine layer of the underlying findings, each carrying its source document, the reviewer, and whether the citation was resolved. The associate's workflow gains one step that cannot be skipped, every external citation must resolve before the document can be issued, the same adversarial check the profession is already bolting on after Mata, moved upstream into the tooling. None of that requires the counterparty to run anything. It helps your own readers immediately, and it makes your document one a sceptic can check.

Being honest about what this buys #

The first essay conceded that making "done" computable does not abolish human judgement; it relocates it, from "is the code correct?" to "is this the right predicate?". This essay owes the same concession, more bluntly, because provenance is exactly as gameable as a kernel proof and for the same reason: it is a claim, and a claim can be vacuous.

A lineage record is worthless if the check behind it could not have failed. "Human-checked: yes" is theatre if the human rubber-stamped it, and the incentive to rubber-stamp grows precisely as volume grows, which is the regime we are entering. A confidence score is a lie wearing a number if it was generated by the same model whose output it scores, asked to rate its own work; self-assessment does not become a measurement by being expressed as a decimal. A "source" that is itself an unverified machine generation has not grounded anything; it has added a hop to the microplastics chain and dressed it as a citation. Provenance can launder a false "done" in a uniform, exactly as a vacuous predicate can, and it is harder to spot because it looks like diligence.

So the same discipline applies, pushed across the boundary. A lineage record counts only if the check it records could have come out the other way: a confidence calibrated against outcomes that were sometimes wrong, a "verified" naming a party who would actually answer for it, a source that resolves to something a third party can fetch. The receiver's cheap loop must include the adversarial assumption that the trail itself may be performance, and spend a little of the scarce human attention auditing the audit. This does not recurse forever; it terminates at a trust anchor, either a human whose accountability is institutionally enforced, by liability or professional indemnity, or a cryptographic root. Name the anchor and the regress stops; leave it unnamed and "trust the trail" is just "trust the output" with extra steps. The honest accounting is the same shape as the first essay's: the trail does not eliminate the judgement, it shrinks the surface the judgement must cover, from "is this whole document true?" to "is this lineage real, and does the human layer survive scrutiny?". Far smaller. Still not zero. Never zero.

The strongest objections #

"This is just an executive summary plus a metadata field." The format is old; I said so. The claim is not that a short-over-long layout is novel, nor that metadata is. It is that the document is a control interface between organisations and almost no one designs it as one. Three things follow that the courtesy summary does not give you: the layers are authored by different parties for different readers rather than being one prose written for one reader, the lineage travels so a receiver can close a loop without seeing inside the sender, and the human's job is specified as adversarial judgement on a deliberately small surface. Bolt a provenance field onto a forty-page memo no one reads and you have better-labelled microplastics. The point is the whole interface, not the field.

"Why would the sender pay for this? It helps the receiver." This is the sharpest objection, because the cost falls on the sender and the benefit lands on the receiver, and in exactly the high-stakes adversarial settings where the argument is strongest, the sender's interests diverge most. A law firm sells confident, authoritative opinion; attaching a calibrated 0.7 to a key claim lets the counterparty discount it or aim at it. So the honest answer has a boundary. In bulk, low-stakes, AIDR-saturated exchange, where the receiver pastes everything into a model and never inspects a trail, the reputational signal does not land and the sender will not pay; provenance there is for the sender's own internal loop, not for the counterparty. The mechanism bites in the slow, high-stakes, repeated, adversarially-reviewed settings, and it bites through two routes. Reputation: a sender whose documents prove out under checking accrues trust, and one whose documents cannot be checked is, as volume rises, increasingly indistinguishable from a generator with no checks behind it. And a market for lemons: once receivers have been burned enough to start discounting unprovenanced documents on sight, the absence of a trail becomes the costly signal, and the discount, not anyone's virtue, is what makes the sender pay. There is also an internal barrier worth naming, because in a law firm it is the first one raised: hourly billing rewards the long document, not the short one, and the partners who would have to change the format are the ones the old format served. That is why the pressure for this comes from the buy side rather than the supply side, from the sophisticated client whose own agents have to read the firm's output and who can simply prefer the firm whose documents are checkable. Neither route is automatic, and where neither operates the proposal is honestly just internal hygiene with an external option.

"Provenance can be forged, so it buys nothing." It can be, which is why a trail that could not have failed counts for nothing here. But the objection proves too much: audited accounts can be cooked and we do not abandon audit. Plain lineage delivers accountability, not tamper-evidence, and the two are different tiers for different threat models. Between parties with an ongoing relationship and a human in the loop, accountability, a named party who staked themselves and a trail a sceptic can pull, is the right and proportionate mechanism. Where the exchange is adversarial, machine-scale, or one-shot, accountability is not enough and you need cryptographic binding to an identity and a trust root the receiver recognises, the genuinely harder engineering the unilateral version does not provide. What changes either way is the game: from "trust the confident output" to "produce a checkable trail or be visibly without one", which is strictly harder and more visible to get wrong.

"Models will get good enough that the receiver re-derives everything and provenance is moot." This is the strong objection, and the answer is independence, not capability. A receiver re-deriving a claim with the same class of generative model that produced it is not running a check, it is fetching a correlated second opinion, and correlated error is the entire microplastics mechanism; the more capable the chain, the more it needs a comparator independent of it, which across a boundary means a trail a different party can pull and a human who is not the author. There is a real qualification. A verifier that is not just a better generator but a retrieval-grounded system with access to primary sources the sender never had does break the correlation, and exactly on the decidable corner, does this citation resolve, is this date consistent with the filing record. That is genuine and it shrinks the surface that needs an external trail. But it does not erase it: on the reasoning layer, whether a conclusion actually follows from the sources, the verifier is doing the same kind of judgement the generator did, and there independence still has to come from outside the model. This is also why the self-improving case is the strongest argument for the independent comparator, not against it: a loop that rewrites itself has, by construction, no source of judgement uncorrelated with itself, so it is the last thing that should certify its own output. The economics point the same way. The receiver does not need a frontier model to re-derive the whole document, only a cheap loop against the lineage and one small independent check at the decidable corner, with the human still supplying the independence on the reasoning layer as above. That is the companion essay's inversion seen across the boundary: correctness bought with verification rather than with parameters and tokens.

"No counterparty will adopt your format." It does not require a shared format to start paying. The two-reader split helps your own readers on day one with no cooperation from anyone, and lineage that travels as plain human-legible citation, source, date, named checker, degrades gracefully: a counterparty who ignores it loses nothing, one who reads it gains a cheap loop, neither needs your schema. That is the floor, and it is genuinely unilateral. It is also, honestly, only the floor. The history of supply-chain provenance is that plain labels spread by liability and regulation and certification, not by goodwill, and that machine-scale trust eventually needs shared infrastructure, signing, registries, transparency logs, not just legible strings. The unilateral tier is where you start because it pays immediately; it is not where a mature version ends. You do not win this by waiting for a committee. You win it by being the sender whose documents are worth trusting, because they are checkable, in a market about to be flooded with documents that are not.

The doors were always the point #

The software factory was a term of derision for thirty years, and the first essay argued the derision was aimed at the wrong half: the problem was never the structure, it was a factory run as an open loop. The same correction applies one level up. We are about to run a great many agentic factories, and they will not sit in isolation. They will trade with each other, at machine speed, in documents, and the documents will be read mostly by other machines on behalf of humans who have stopped reading.

A factory in a supply chain is judged not only by what it makes but by what it lets out of its doors, and in what state, with what label, traceable to what process. We built that discipline for physical goods over a century, because contaminated outputs flowed downstream and poisoned everyone who assumed the upstream supplier had closed their own loop. The agentic document economy is a supply chain for claims, and it currently has no inspection at the doors and no labels on the goods. Semantic microplastics are what you get from a supply chain like that: a contaminant that compounds because nothing at any hop is built to remove it.

The fix is not to make the factory quieter. It is to treat the document as what it has become, the interface between two control loops neither of which can see inside the other, and to build it accordingly: designed for its two readers, carrying the lineage that lets the far side close a loop it could not otherwise close, scoped to who is actually on the other side of the door, and arriving in front of a human whose one irreplaceable job is to assume it is wrong until the small part that matters survives the assumption. Close the loop inside the factory, and then, because the factory has doors, close it at the door too. The doors were always the point.

About the author: Eduardo Aguilar Pelaez is CTO and co-founder at Legal Engine Ltd and an Honorary Research Fellow at the Department of Surgery and Cancer, Faculty of Medicine, Imperial College London. He writes on formal methods, AI agents, and the discipline of building systems that survive being walked away from. This is a companion to Continuous Enforcement and Continuous Verification.

source & further reading

aguilar-pelaez.co.uk — original article Continuous Enforcement and Continuous Verification