AI-Assisted AuthZ Review: Reading Permission Boundaries in Ory Kratos

A developer used AI to generate authorization hypotheses for Ory Kratos, then manually killed each one, producing a kill table instead of false-positive reports. The review found no vulnerabilities in the repo-only analysis, with all five hypotheses—including admin API authorization, cross-tenant reads, token reuse, settings-flow identity confusion, and tenant-from-payload—killed by design or implementation. The approach inverts typical AI security scanning by prioritizing hypothesis elimination over bug discovery.

Second in a series on using AI to review authorization — not to spray reports. Companion reference: AuthZ Smell Catalog. The cheapest thing an AI can do in security is generate suspicion. Point a model at a codebase and it will hand you fifty "possible IDORs" before you finish your coffee. Almost all of them are wrong — guarded three lines up, scoped at the data layer, or protected at a boundary the model never saw. That flood is exactly why several bug bounty programs spent 2026 tightening or pausing: they were drowning in confident, plausible, wrong reports. So this review inverts the usual loop. The AI's job is not to find bugs — it is to over-generate hypotheses cheaply . My job is to kill them. What survives that killing is the only thing worth a human's time, and the record of what died is more useful than the record of what lived. The artifact of an honest review is therefore not a finding. It's a kill table . repo only , and I say so explicitly rather than implying it reaches a live product. What this review does and does not claim.In this limited, repo-only review, the hypotheses I tested were killed. This isnota claim that Kratos has no vulnerabilities, and it is not a security audit. It is a case study in how AI-assisted AuthZ review can avoid false positives — how tokilla suspicion instead of shipping it. I let cheap finders over-generate against the AuthZ Smell Catalog. The raw candidate list, unfiltered: /sessions/whoami , identity lookups, or an admin identity fetch? Five confident hypotheses. This is the part AI is good at. Now the part it can't do. The rule: assume each is by-design until a concrete test says otherwise, and default to killing it. For source-only review, the "test" is: can I trace a H1 — Admin API "missing" authorization → KILLED by design . Kratos deliberately ships the admin API with no built-in authorization. Ory's own documentation states the admin API must be protected at the network boundary ingress, a reverse proxy, Oathkeeper and never exposed publicly. So "no authz check in the handler" is not a missing guard — it is the guard living one layer out, exactly the false-positive shape in Catalog §13 middleware/deployment-layer authorization . A report of "admin API allows identity CRUD without auth" is by-design and would be closed as such. Killed. H2 — Cross-identity / cross-tenant read → KILLED chokepoint design . This is the interesting one. Kratos does not scatter tenant checks across handlers. Its persistence layer runs every query through a network Contextualizer that injects the network id nid into the SQL — the data-access layer itself filters by tenant, centrally. A handler cannot accidentally read across the boundary, because the boundary is enforced below the handler, at the one place every read funnels through. On the public API, identity access is derived from the session's identity, never from a client-supplied id. To break H2 you would have to find a read path that bypasses the persister entirely — and I found no user-reachable one in this build. Killed. And worth noting as a pattern: concentrating the tenant filter at the data-access layer collapses the whole class into a single auditable point — which is why these particular hypotheses died here Catalog §B . H3 — Token reuse → KILLED. Recovery and verification tokens are single-use and time-boxed; redemption invalidates the token in the same transaction. Replay after use fails. Killed. H4 — Settings-flow identity confusion → KILLED. The settings flow binds to the identity resolved from the authenticated session. The identity being modified is not taken from client input, so you cannot retarget the flow at someone else's traits. Killed Catalog §02 — read-reachability is not write-reachability, and here even read is session-bound . H5 — Tenant from payload → KILLED. The network id is derived from context, not from the request body. An admin create/update cannot smuggle a foreign nid . Killed. The deliverable of the whole review, on one screen: | | Hypothesis | Catalog | Verdict | Why it died | |---|---|---|---|---| | H1 | Admin API missing authz | §01, §13 | by-design | authz is at the network boundary, not the handler — documented | | H2 | Cross-identity / cross-tenant read | §04, §05 | defended | nid enforced at the persister via the Contextualizer; public reads are session-bound | | H3 | Recovery/verification token reuse | §09 | defended | single-use, time-boxed, invalidated on redemption | | H4 | Settings-flow identity confusion | §02, §07 | defended | flow bound to the session identity, not client input | | H5 | Tenant assignment from payload | §04 | defended | nid from context, not request body | Five hypotheses in. Zero findings out. This is a successful review, not a failed one — and to be exact, it is a successful review of five hypotheses , not a clean bill of health for Kratos. Two shapes here generalize far beyond Kratos: None of the hypotheses I tested survived source-only review — and the reason is worth publishing: Kratos concentrates its tenant boundary in one place the persister's Contextualizer and derives identity from the session rather than from client input. That design choice is precisely what made four of my five hypotheses collapse to one question, and that question had a clean answer. If I were to keep going, the only honest next move would be to enumerate every ingress that could reach persisted data without the persister — background jobs, imports, any raw query. In the OSS build there is no user-reachable one. That negative result is real signal, and it is tier repo only : I am not claiming it holds against any specific hosted deployment. repo only is not hosted confirmed . Say which one you have. Conflating them is how OSS reading turns into a false bounty claim.Each kill sharpened a catalog entry's confirm/kill column — the column that separates a real bug from a by-design behavior: The catalog is not a static list; every real outcome — even a clean by-design result — feeds a sharper kill test back into it. That feedback loop is the asset, not the entry count. This review produces exactly one row for the outcome ledger — the honest kind, a defended target: date=2026-07-04, program=Ory Kratos self-directed OSS review , source type=oss source available, class=tenant boundary, repro tier=repo only, human verdict=by design, final status=not applicable, payout usd=0, lesson="Contextualizer/nid chokepoint concentrates the tenant boundary; admin-API authz is deployment-layer by design — both are KILLs, not bugs. Review collapses to: can anything reach persisted data without the persister?" Row 1 in a ledger is not supposed to be a payout. It's supposed to be true . From here, the next step is a single source-available target with a newly-added permission boundary a fresh RBAC, workspace, billing, or SSO/SCIM feature — the un-picked-over surface — run through the same over-generate-then-kill loop, and logged as ledger row 2. One target. Not ten. I use AI to reject candidates and humans to verify the few that survive. If that approach is useful to you, the AuthZ Smell Catalog is the companion reference this series builds on.