Second in a series on using AI to review authorization — not to spray reports.
Companion reference: AuthZ Smell Catalog.
The cheapest thing an AI can do in security is generate suspicion. Point a model at a
codebase and it will hand you fifty "possible IDORs" before you finish your coffee. Almost
all of them are wrong — guarded three lines up, scoped at the data layer, or protected at a
boundary the model never saw.
That flood is exactly why several bug bounty programs spent 2026 tightening or pausing:
they were drowning in confident, plausible, wrong reports.
So this review inverts the usual loop. The AI's job is not to find bugs — it is to
over-generate hypotheses cheaply. My job is to kill them. What survives that killing
is the only thing worth a human's time, and the record of what died is more useful than the
record of what lived.
The artifact of an honest review is therefore not a finding. It's a kill table.
repo_only
, and I say so explicitly rather than implying it reaches a live product.
What this review does and does not claim.In this limited, repo-only review, the
hypotheses I tested were killed. This isnota claim that Kratos has no vulnerabilities,
and it is not a security audit. It is a case study in how AI-assisted AuthZ review can
avoid false positives — how tokilla suspicion instead of shipping it.
I let cheap finders over-generate against the AuthZ Smell Catalog. The raw candidate
list, unfiltered:
/sessions/whoami
, identity lookups, or an admin identity fetch? Five confident hypotheses. This is the part AI is good at. Now the part it can't do.
The rule: assume each is by-design until a concrete test says otherwise, and default to killing it. For source-only review, the "test" is: can I trace a
H1 — Admin API "missing" authorization → KILLED (by design).
Kratos deliberately ships the admin API with no built-in authorization. Ory's own
documentation states the admin API must be protected at the network boundary (ingress, a
reverse proxy, Oathkeeper) and never exposed publicly. So "no authz check in the handler" is
not a missing guard — it is the guard living one layer out, exactly the false-positive shape
in Catalog §13 (middleware/deployment-layer authorization). A report of "admin API allows
identity CRUD without auth" is by-design and would be closed as such. Killed.
H2 — Cross-identity / cross-tenant read → KILLED (chokepoint design).
This is the interesting one. Kratos does not scatter tenant checks across handlers. Its
persistence layer runs every query through a network Contextualizer that injects the
network id (nid
) into the SQL — the data-access layer itself filters by tenant, centrally.
A handler cannot accidentally read across the boundary, because the boundary is enforced
below the handler, at the one place every read funnels through. On the public API, identity
access is derived from the session's identity, never from a client-supplied id. To break
H2 you would have to find a read path that bypasses the persister entirely — and I found no
user-reachable one in this build. Killed. And worth noting as a pattern: concentrating the
tenant filter at the data-access layer collapses the whole class into a single auditable
point — which is why these particular hypotheses died here (Catalog §B).
H3 — Token reuse → KILLED.
Recovery and verification tokens are single-use and time-boxed; redemption invalidates the
token in the same transaction. Replay after use fails. Killed.
H4 — Settings-flow identity confusion → KILLED.
The settings flow binds to the identity resolved from the authenticated session. The identity
being modified is not taken from client input, so you cannot retarget the flow at someone
else's traits. Killed (Catalog §02 — read-reachability is not write-reachability, and here
even read is session-bound).
H5 — Tenant from payload → KILLED.
The network id is derived from context, not from the request body. An admin create/update
cannot smuggle a foreign nid
. Killed.
The deliverable of the whole review, on one screen:
| # | Hypothesis | Catalog | Verdict | Why it died |
|---|---|---|---|---|
| H1 | Admin API missing authz | §01, §13 | by-design | |
| authz is at the network boundary, not the handler — documented | ||||
| H2 | Cross-identity / cross-tenant read | §04, §05 | defended | |
nid enforced at the persister via the Contextualizer; public reads are session-bound |
||||
| H3 | Recovery/verification token reuse | §09 | defended | |
| single-use, time-boxed, invalidated on redemption | ||||
| H4 | Settings-flow identity confusion | §02, §07 | defended | |
| flow bound to the session identity, not client input | ||||
| H5 | Tenant assignment from payload | §04 | defended | |
nid from context, not request body |
Five hypotheses in. Zero findings out. This is a successful review, not a failed one —
and to be exact, it is a successful review of five hypotheses, not a clean bill of health
for Kratos.
Two shapes here generalize far beyond Kratos:
None of the hypotheses I tested survived source-only review — and the reason is worth
publishing: Kratos concentrates its tenant boundary in one place (the persister's
Contextualizer) and derives identity from the session rather than from client input. That
design choice is precisely what made four of my five hypotheses collapse to one question, and
that question had a clean answer.
If I were to keep going, the only honest next move would be to enumerate every ingress that
could reach persisted data without the persister — background jobs, imports, any raw query.
In the OSS build there is no user-reachable one. That negative result is real signal, and it
is tier repo_only
: I am not claiming it holds against any specific hosted deployment.
repo_only
is not hosted_confirmed
. Say which one you have. Conflating them is how OSS reading turns into a false bounty claim.Each kill sharpened a catalog entry's confirm/kill column — the column that separates a real
bug from a by-design behavior:
The catalog is not a static list; every real outcome — even a clean by-design result — feeds a
sharper kill test back into it. That feedback loop is the asset, not the entry count.
This review produces exactly one row for the outcome ledger — the honest kind, a defended
target:
date=2026-07-04, program=Ory Kratos (self-directed OSS review), source_type=oss_source_available,
class=tenant_boundary, repro_tier=repo_only, human_verdict=by_design, final_status=not_applicable,
payout_usd=0, lesson="Contextualizer/nid chokepoint concentrates the tenant boundary; admin-API
authz is deployment-layer by design — both are KILLs, not bugs. Review collapses to: can anything
reach persisted data without the persister?"
Row #1 in a ledger is not supposed to be a payout. It's supposed to be true. From here, the
next step is a single source-available target with a newly-added permission boundary
(a fresh RBAC, workspace, billing, or SSO/SCIM feature) — the un-picked-over surface — run
through the same over-generate-then-kill loop, and logged as ledger row #2. One target. Not ten.
I use AI to reject candidates and humans to verify the few that survive. If that approach is useful to you, the AuthZ Smell Catalog is the companion reference this series builds on.