{"slug": "your-provenance-vector-dies-at-the-storage-boundary", "title": "Your Provenance Vector Dies at the Storage Boundary", "summary": "A developer identified two failure modes for provenance vectors in production: enforcement and persistence. To address enforcement, the developer proposes making unsafe actions unrepresentable via type systems, inspired by capability-based security. For persistence, the developer recommends structural compression per axis rather than naive summarization.", "body_md": "Last post I argued that agent trust should be a [typed provenance vector](https://dev.to/p0rt/trust-isnt-a-scalar-typed-provenance-for-agent-chains-229p): carry what-degraded-and-how alongside each result, propagate it, let each consumer apply its own policy. The comments agreed on the model and then immediately found the two places it breaks in the real world. Both are load-bearing, both were things I hand-waved, and this post is about them.\n\nBoth are right, and together they name the two ways a provenance vector dies in production: nobody reads it, or it can't survive being stored. One problem is about *enforcement*, the other about *persistence*.\n\nTL;DR— Two failure modes kill a provenance vector in production.Enforcement:if acting on a value doesn'trequirepassing through the gate, developers (and models writing tool calls) will skip it — so make the unsafe path unrepresentable via types, not discipline.Persistence:on long-horizon agents the vector must survive compression to fit bounded memory, and naive summarization washes out exactly the axes you need — so compress structurally (per-axis, lossless scores + lossy lineage), not as prose.\n\nMykola's point is the one that should scare you, because it's true of almost every \"add metadata to make it safer\" scheme: the metadata is optional, so under deadline it gets skipped. You can ship a beautiful `Provenance`\n\ntype and six months later find that the payment path reads `result.value`\n\nand never touches `result.provenance`\n\n. The lattice was perfect. Nobody consulted it.\n\nThe fix is not \"remember to check.\" Discipline doesn't scale and it definitely doesn't survive a model writing its own tool calls. The fix is to make *acting without checking* something the code physically cannot express.\n\nThis is a solved problem in a neighboring field, and it's worth stealing wholesale. Capability-based security has done this for decades: authority is an **unforgeable token you must hold a reference to** — you can't perform the action without possessing the capability, and possession is the check. Recent work brings this into static types explicitly: track the capability in the type system, and the *absence* of it in a function's type guarantees, at compile time, that the function can't perform the guarded action. The safety isn't a runtime assertion you might forget — it's a property of what typechecks.\n\nApplied to provenance, the move is: **the irreversible action can't accept a raw value, only a gated one.**\n\n``` python\nfrom typing import Generic, TypeVar, NoReturn\nT = TypeVar(\"T\")\n\nclass Provenanced(Generic[T]):\n    \"\"\"A value you cannot use for a side effect without unwrapping —\n    and the ONLY unwrap path runs the gate.\"\"\"\n    def __init__(self, value: T, prov: Provenance):\n        self._value = value\n        self._prov = prov\n\n    def unwrap_for(self, action: \"Policy\") -> T:\n        decision = gate(action, self._prov)\n        if decision != \"proceed\":\n            raise ProvenanceViolation(decision, self._prov)  # refetch / escalate / ...\n        return self._value\n\n# the side-effecting function's SIGNATURE refuses raw values:\ndef charge_card(amount: Provenanced[Money], policy: Policy) -> Receipt:\n    money = amount.unwrap_for(policy)   # the only way to get the Money out\n    ...\n```\n\nNow \"charge the card without checking provenance\" doesn't fail code review — it doesn't typecheck. There is no path from a raw `Money`\n\nto `charge_card`\n\n, because the signature demands `Provenanced[Money]`\n\n, and the only way to extract the value runs the gate. You've moved the enforcement from the developer's memory into the type system. It's the same trick as idempotency keys from two posts ago: don't ask people to remember the safe thing, make the unsafe thing unrepresentable.\n\n**The honest limit** (which a commenter will rightly raise, so I'll raise it first): this holds at the *framework boundary*, in typed code you control. The moment your agent writes free-form tool calls — the model generating Python that calls your API directly — it can simply not use the wrapper, and you're back to enforcement-by-hope. For that case the type system can't reach, so enforcement has to drop to the infrastructure layer: the side-effecting tools sit behind a proxy that refuses any call whose payload doesn't carry valid provenance. You lose compile-time guarantees and get runtime rejection instead — worse, but still \"structurally can't skip it\" rather than \"please remember.\" The principle survives even when the mechanism changes: enforcement lives in something the actor can't route around, never in something it's asked to honor.\n\nmote's problem is deeper and I didn't have an answer in the thread, so I went and found one. Here's the setup: a long-horizon agent — mote's case is literally robots on edge hardware with a hard context ceiling — can't hold a growing provenance graph in working memory across 500 steps. It has to compress. And the standard compression move, summarize-history-into-prose, is catastrophic for provenance specifically, because summarization is *lossy in an uncontrolled way* — it'll happily drop \"step 47 ran on a stale cache\" to save tokens, and that's the one fact a downstream gate needed.\n\nThis isn't hypothetical. The field now attributes the majority of enterprise agent failures to context drift and memory loss during multi-step reasoning — not to hitting the context limit, but to the *quality degradation on the way there*. And there's a subtler trap the RL-agent researchers named: compression credit is causally entangled — the same downstream failure needs opposite explanations depending on whether the bad state came from a tool or from memory. If your compression flattens that distinction, you can't even diagnose what broke.\n\nSo the naive answer — \"summarize the provenance too\" — reintroduces the exact scalar-collapse problem from the last post, now smuggled in through the storage layer. A summary is an average wearing a trench coat.\n\nThe better answer comes from a simple observation: **the axes have different compression economics, so don't compress them uniformly.**\n\n`freshness: 0.2, capability: 0.6`\n\n— is a handful of numbers. Even across 500 steps, if you keep only the `min`\n\nfrom last post), that's constant size regardless of history length. You never need to compress the scores, because `min`\n\n-reduction already bounds them.`tainted_by`\n\nsets — This maps onto where the research is heading. The most promising long-horizon approaches have stopped treating the trajectory as prose to be summarized and started treating it as a **typed dependency graph the agent annotates as it works**, with a deterministic eviction policy that walks the graph when the token budget blows — explicitly to avoid the four pathologies of prose compaction: unpredictable lossiness, structural destruction, blocking cost, and compression-induced hallucination. A typed provenance vector *is* that annotation. The eviction policy for provenance is: evict lineage detail, never evict axis scores.\n\nThere's one more axis this forces you to add, and it's almost funny: **compression is itself a degradation source.** A vector reconstructed from a lossy summary is less trustworthy than one carried whole — so \"this provenance was reconstructed across a storage boundary\" is a real provenance fact that deserves its own axis. `reconstruction: 0.8`\n\nmeans \"these scores survived a compaction; treat the lineage as approximate.\" The provenance system has to describe its own lossiness. Turtles, but only two deep.\n\nEvery post in this series has ended up borrowing from security, and this one makes the reason explicit. Traditional taint tracking assumes deterministic program states and exact data-flow: memory locations, registers, string matches. LLM agents break all of that — untrusted content gets *rewritten, summarized, and used to choose later actions*, so \"did this bad input reach that sink\" is a question about semantic and causal influence, not byte-level flow. The agent security researchers building taint trackers for exactly this case had to redefine propagation to include semantic transformation and cross-session persistence through memory — which is the same two problems this post is about (enforcement and persistence), arrived at from the attack side instead of the reliability side.\n\nThat convergence is the tell. When the reliability people and the security people independently reinvent the same structure — unforgeable gating plus provenance that survives memory — it's because it's the actual shape of the problem, not a preference.\n\nFour posts, one arc:\n\nThe through-line, one more time: agent reliability is a provenance problem, and provenance is a solved discipline — capability security, data lineage, taint analysis — that we're re-deriving because the untraceable thing now acts, and acts through a bounded, forgetful, non-deterministic memory. The novelty isn't the primitives. It's that they now have to hold under compression and under a model that can route around anything you merely *ask* it to respect.\n\nIf you're building this: gate at a boundary the actor can't skip (type or proxy), compress scores losslessly and lineage lossily, and add a `reconstruction`\n\naxis the day your provenance crosses a storage line. Start there.\n\n*Credit, again, to the comment section that wrote the spec: **mote** (compression across the storage boundary, the edge/bounded-context framing that motivates the whole second half), **Mykola Kondratiuk** (enforcement is the hard part, not the model), plus **Tae Kim**, **Nazar Boyko**, **Ken**, and **Ahmet Özel** for sharpening the axis rules in the last thread. Open question for this one: has anyone actually run provenance across a compaction boundary in production and measured what the gate decisions do on the reconstructed vector versus the original? That's the experiment I don't have data for yet — and it's the one that decides whether any of this holds.*", "url": "https://wpnews.pro/news/your-provenance-vector-dies-at-the-storage-boundary", "canonical_source": "https://dev.to/p0rt/your-provenance-vector-dies-at-the-storage-boundary-4cc", "published_at": "2026-07-01 11:58:09+00:00", "updated_at": "2026-07-01 12:19:11.324797+00:00", "lang": "en", "topics": ["ai-agents", "ai-safety", "developer-tools"], "entities": ["Mykola"], "alternates": {"html": "https://wpnews.pro/news/your-provenance-vector-dies-at-the-storage-boundary", "markdown": "https://wpnews.pro/news/your-provenance-vector-dies-at-the-storage-boundary.md", "text": "https://wpnews.pro/news/your-provenance-vector-dies-at-the-storage-boundary.txt", "jsonld": "https://wpnews.pro/news/your-provenance-vector-dies-at-the-storage-boundary.jsonld"}}